Research and Improvement on Content-Based Web Search Engine

Article Preview

Abstract:

There is a lot of information contained in the World Wide Web. It has become a research focus to obtain the required related resources quickly and accurately from the web through the content-based search engines. Most current tools of full text web search engine, such as Lucene which is a widely used open source retrieval library in information retrieval field, are purely keyword based. This may not sufficient for users to retrieve in the web. In this paper, we employ a method to overcome the limitations of current full text search engines in represent of Lucene. We propose a Query Expansion and Information Retrieval approach which can help users to acquire more accurate contents from the web. The Query Expansion component finds expanded candidate words of the query word through WordNet which contains synonyms in several different senses; In the Information Retrieval component, the query word and its candidate words are used together as the input of the search module to get the result items. Furthermore, we can put the result items into different classes based on the expansion. Some experiments and the results are described in the late part of this paper.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1282-1286

Citation:

Online since:

June 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] http: /en. wikipedia. org/wiki/Archie_search_engine.

Google Scholar

[2] http: /www. google. com.

Google Scholar

[3] www. baidu. com.

Google Scholar

[4] http: /lucene. apache. org.

Google Scholar

[5] Mohammad Azadnia, Web Information Retrieval Systems Integration Using Web Service, in ICCDA, 2010, V2-206.

Google Scholar

[6] WordNet: an electronic lexical database Cambridge, Mass : MIT Press, 1999, c1998.

Google Scholar

[7] Jian Wan and Shengyi Pan, Performance Evaluation of Compressed Inverted Index in Lucene, in ICRCCS, 2009, p.178–181.

Google Scholar

[8] Naskar, S.K. and Bandyopadhyay, S., Word Sense Disambiguation Using Extended WordNet, in ICCTA, 2007, p.446–451.

DOI: 10.1109/iccta.2007.134

Google Scholar

[9] Safarkhani, B. and Mohsenzadeh, M. and Rahmani, A.M., Improving Website User Model Automatically Using a Comprehensive Lexical Semantic Resource, in EBISS, 2009, p.1–5.

DOI: 10.1109/ebiss.2009.5138001

Google Scholar

[10] http: /crawler. archive. org.

Google Scholar