Design of Theme Crawler for Web Forum

Zhao Qiu; Ceng Jun Dai; Tao Liu

doi:10.4028/www.scientific.net/AMM.548-549.1330

Paper Titles

A New Intrusion Detection Method Based on Machine Learning in Mobile Ad Hoc NETwork
p.1304

Study on High-Performance Simulation Computer for Large-Scale System of Systems Simulation
p.1311

Investigating the Effect of Software Complexity Metrics on Software Cost
p.1319

A Design for Fault-Tolerant Communication Middleware Based on Time-Triggered
p.1326

Design of Theme Crawler for Web Forum
p.1330

Design and Implementation of the Network Electronic Identity Management System
p.1334

Development of Bluetooth Roll Call System Based on Android Platform
p.1339

On the Construction of Efficient Private Signature Scheme
p.1343

Alternative Antenna Research to Optimize MIL-STD-188-125-1 Standard Shielding Effectiveness Measurement
p.1347

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 548-549Design of Theme Crawler for Web Forum

Design of Theme Crawler for Web Forum

Abstract:

Network crawler as web information extraction tools, it can download web pages from internet for the engine. The implementation strategy and operating efficiency of crawling program have a direct influence on results of subsequent work. The paper aimed at the shortcomings of ordinary crawler, puts forward a practical and efficient precise crawler theme method for the BBS, the method for the BBS characteristics, attempts in the web page parsing, theme correlation analysis and the crawling strategy, using the template configuration, analyze and crawl on the article. The method is better than the general crawler in the performance, accuracy and comprehensive rate.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 548-549)

Pages:

1330-1333

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.548-549.1330

Citation:

Cite this paper

Online since:

April 2014

Authors:

Zhao Qiu*, Ceng Jun Dai, Tao Liu

Keywords:

Theme Crawler, Web Crawler, Web Forum

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] Liu P, PuC, Han W. XWPAP: An XML-enabled wrapper construction system for Web information sources. Proceedings of the 16th International Conference on Data Engineering, Washington (2000), pp.611-622.

DOI: 10.1109/icde.2000.839475

Google Scholar

[2] Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. Proceedings of the 27th International Conference on Very Large Data Bases, San Francisco (2001), pp.119-128.

Google Scholar

[3] Guimei Wang. Research on Key Techniques of Topical Web Crawler. Harbin Institute of Technology (2009), p.15.

Google Scholar

[4] Chakrabarti S, Dom B, NDYK P. Enhanced hypertext categorization using hyper links. Proceedings of the ACM SIGMOD International Conference on Management of Data, New York (1998), pp.307-318.

DOI: 10.1145/276305.276332

Google Scholar

[5] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. Proceedings of ACM International Conference Management of Data, Dallas (2000), pp.427-438.

DOI: 10.1145/335191.335437

Google Scholar

[6] Shitao LIU. Introduces a web crawler search strategy in the search engines. Journal of Fuyang Teachers College (Natural Sciences), vol. 09 (2006), pp.60-63.

Google Scholar

[7] Yiping Du. Design and research of topic web crawler search. Hefei: University of Science and Technology of China (2009), p.34.

Google Scholar

[8] Gang Li, Wei Song, Zhe Qiu. Construction of search engine by conquering Ajax and Lucene (Posts and Telecom Press, Beijing (2006).

Google Scholar

[9] Xiaozhu Wu. Design and implementation spiders based on JAVA multi-thread. Fujian Computer, vol. 06 (2004), p.83.

Google Scholar

[10] Xiaoming Li, Hongfei Ming, Jiming Wang. Search engine, principle, technology and system (science press, Beijing 2006).

Google Scholar

[11] Brin S, Page L. The anatomy of a large-scale hypertextual Web-search engine. Proceedings 7th International World Wide Web Conference, Brisbane (1998), pp.146-164.

DOI: 10.1016/s0169-7552(98)00110-x

Google Scholar