Design of Theme Crawler for Web Forum

Article Preview

Abstract:

Network crawler as web information extraction tools, it can download web pages from internet for the engine. The implementation strategy and operating efficiency of crawling program have a direct influence on results of subsequent work. The paper aimed at the shortcomings of ordinary crawler, puts forward a practical and efficient precise crawler theme method for the BBS, the method for the BBS characteristics, attempts in the web page parsing, theme correlation analysis and the crawling strategy, using the template configuration, analyze and crawl on the article. The method is better than the general crawler in the performance, accuracy and comprehensive rate.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1330-1333

Citation:

Online since:

April 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Liu P, PuC, Han W. XWPAP: An XML-enabled wrapper construction system for Web information sources. Proceedings of the 16th International Conference on Data Engineering, Washington (2000), pp.611-622.

DOI: 10.1109/icde.2000.839475

Google Scholar

[2] Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. Proceedings of the 27th International Conference on Very Large Data Bases, San Francisco (2001), pp.119-128.

Google Scholar

[3] Guimei Wang. Research on Key Techniques of Topical Web Crawler. Harbin Institute of Technology (2009), p.15.

Google Scholar

[4] Chakrabarti S, Dom B, NDYK P. Enhanced hypertext categorization using hyper links. Proceedings of the ACM SIGMOD International Conference on Management of Data, New York (1998), pp.307-318.

DOI: 10.1145/276305.276332

Google Scholar

[5] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. Proceedings of ACM International Conference Management of Data, Dallas (2000), pp.427-438.

DOI: 10.1145/335191.335437

Google Scholar

[6] Shitao LIU. Introduces a web crawler search strategy in the search engines. Journal of Fuyang Teachers College (Natural Sciences), vol. 09 (2006), pp.60-63.

Google Scholar

[7] Yiping Du. Design and research of topic web crawler search. Hefei: University of Science and Technology of China (2009), p.34.

Google Scholar

[8] Gang Li, Wei Song, Zhe Qiu. Construction of search engine by conquering Ajax and Lucene (Posts and Telecom Press, Beijing (2006).

Google Scholar

[9] Xiaozhu Wu. Design and implementation spiders based on JAVA multi-thread. Fujian Computer, vol. 06 (2004), p.83.

Google Scholar

[10] Xiaoming Li, Hongfei Ming, Jiming Wang. Search engine, principle, technology and system (science press, Beijing 2006).

Google Scholar

[11] Brin S, Page L. The anatomy of a large-scale hypertextual Web-search engine. Proceedings 7th International World Wide Web Conference, Brisbane (1998), pp.146-164.

DOI: 10.1016/s0169-7552(98)00110-x

Google Scholar