An Improved Crawler Algorithm Based on Hierarchical Structure Preservation

Zhi Feng Hao; Ze Bin Zhang; Zhao Quan Cai; Han Huang

doi:10.4028/www.scientific.net/KEM.474-476.2120

Paper Titles

Robust Control for Uncertain Unified Chaotic Systems with Input Nonlinearity via Improved Sliding Mode Technique
p.2100

Generalized Projective Synchronization for Fractional-Order Chaotic Systems with Different Fractional Order
p.2106

Preparation and Properties of Polyethylene/Montmorillonite Composites
p.2110

Fatigue Life Prediction Based on GA-BP Algorithm
p.2114

An Improved Crawler Algorithm Based on Hierarchical Structure Preservation
p.2120

Application of DCP to Evaluate Subgrade Compaction Quality
p.2125

DDoS Detection and Prevention Based on Joint Entropy and Conditional Entropy
p.2129

Effect of Nanocrystalline and Ti Implantation on the Oxidation Behaviour of Fe₈₀Cr₂₀Alloy and Commercial Ferritic Steel
p.2134

The Study of Forest Fire Color Image Segmentation
p.2140

HomeKey Engineering MaterialsKey Engineering Materials Vols. 474-476An Improved Crawler Algorithm Based on...

An Improved Crawler Algorithm Based on Hierarchical Structure Preservation

Abstract:

This paper proposes an improved web crawler algorithm to climb more useful information since the basic web crawler algorithm is low-efficiency and easy to climb useless repeated information. By the proposed algorithm, the website urls are hierarchical saved to store websites overall topology, which will make crisscross complex web URL system from a graphic structure into a tree structure. The actual website BBS experiments show that the algorithm is much better than the basic web crawler algorithm in crawling speed and download information such as the usefulness of baking. Furthermore, it provides a performing structure mode for the increment crawler algorithm.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Key Engineering Materials (Volumes 474-476)

Pages:

2120-2124

DOI:

https://doi.org/10.4028/www.scientific.net/KEM.474-476.2120

Citation:

Cite this paper

Online since:

April 2011

Authors:

Zhi Feng Hao, Ze Bin Zhang, Zhao Quan Cai, Han Huang

Keywords:

Hierarchical Structure Preservation, URL Filter, Web Crawler

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] R. Baeza-Yates, C. Castillo, M. Marin, and A. Rodriguez, Crawling a Country: better strategies than breadth-first for Web page ordering, in Proceedings of the 14th WWW, pp.864-872, Chiba, Japan, May 10-14, (2005).

DOI: 10.1145/1062745.1062768

Google Scholar

[2] D. Ahlers and S. Boll, Adaptive geospatially focused crawling, in Proceedings of the 18th Conference on Information and Knowledge Management, (2009).

DOI: 10.1145/1645953.1646011

Google Scholar

[3] Tao Peng, Yu Meng, Wan-Li Zuo, Yin Wang, Liang Hu, Tunneling techniques for Focused Web Crawling, Journal of Computer Research and Development, vol. 4, p.628−637, (2010).

Google Scholar

[4] J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy, Google's deep-web crawl, in Proceedings of the 34th International Conference on Very Large Data Bases, (2008).

DOI: 10.14778/1454159.1454163

Google Scholar

[5] A. Agarwal, H. S. Koppula, K. P. Leela, K. P. Chitrapura, S. Garg, P. K. GM, C. Haty, A. Roy, and A. Sasturkar, URL normalization for de-duplication of web pages, in Proceedings of the 18th Conferenceon Information and Knowledge Management, (2009).

DOI: 10.1145/1645953.1646283

Google Scholar

[6] Tao Meng, Ji-Min Wang, Hongfei Yan, Web Evolution and Incremental Crawling, Journal of Software, vol17, no 5, p.1051−1067, (2006).

Google Scholar

[7] Y. Guo, K. Li, K. Zhang, and G. Zhang, Board forum crawling: a Web crawling method for Web forum, in Proceedings of the 2006 IEEE/WIC/ACM Int. Conf. Web Intelligence, pages 745−748, Hong Kong, Dec. (2006).

DOI: 10.1109/wi.2006.52

Google Scholar

[8] Y. Wang, J. -M. Yang, W. Lai, R. Cai, L. Zhang, and W. -Y. Ma, Exploring traversal strategy for Web forum crawling, in Proceedings of the 31st SIGIR, pages 459-466. Singapore, July (2008).

DOI: 10.1145/1390334.1390413

Google Scholar

[9] Cai Rui, Yang Jiangming, Lai Wei, et al, iRobot: An Intelligent Crawler for Web Forums, in Proceedings of the 17th International World Wide Web Conference. Beijing, China: [s. n. ], (2008).

DOI: 10.1145/1367497.1367558

Google Scholar