Topic Crawling Strategy Based on Wikipedia and Analysis of Pages' Similarity

Article Preview

Abstract:

Considering the weaknesses existing in the present topic crawling strategies, this paper puts forward a new method which is based on Wikipedia and the analysis of page similarity. Firstly, the topic is described via Wikipedia. Then, handle the downloaded web. Finally, calculate the priorities of the links through text relativity and analysis of the web links. The result indicates that this new method is better than the traditional in terms of searching results and topic relativity and is worth popularizing.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2407-2412

Citation:

Online since:

November 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] AGGARWAL C C,AL-GARAWI,YU P S. Intelligent crawling on the world wide Web with arbitrary predicates[C]//Proceedings of the 10th International Conference on World Wide Web. New York:ACM,2001:89-113.

DOI: 10.1145/371920.371955

Google Scholar

[2] Ehrig M, Maedche A. Ontology-Focused crawling of Web documents. In: Lamont BG,ed. Pr-oc. of the 2003 ACM Symp. on Applied Computing. New York: ACM Press,2003:1174-1178.

DOI: 10.1145/952532.952761

Google Scholar

[3] BHARAT K,HENZINGER M R. Improved algorithms for topic distillation in a hyperlinked environment[C]//The 21st International ACM SIGIR Conference on Research and Development in Information Retrievial( SIGIR298). Melbourne, Australia: ACM Press,1998:104-111.

DOI: 10.1145/290941.290972

Google Scholar

[4] Zhumin Chen. For vertical search engine focused crawling technology Research[D]. Jinan: Shandong University, 2008.(In Chinese)

Google Scholar

[5] Wikipedia[EB/OL]. [2011-02-16].http://wikipedia.jaylee.cn/.

Google Scholar

[6] STRUBE M,PONZETTO S P. WikiRelate comtuting semantic relatedness using Wikipedia[C]//Proceedings of the National Conference on Artificial Intelligence. Cambridge: AAAI Press, 2006: 1419-1424.

Google Scholar

[7] Michael Herseovici, Michal Jacov, Yoelle S Maarek. The Shark-Search Algorithm-An Application: Tailored Web Site Mapping[J]. Computer Networks and ISDN System, 1998, (30): 317-326.

DOI: 10.1016/s0169-7552(98)00038-5

Google Scholar

[8] KLEINBERG J M. Authoritative sources in a hyperlinked environment[J]. Journal of ACM, 1999,46(5):604-632.

DOI: 10.1145/324133.324140

Google Scholar