An Improved Crawler Algorithm Based on Hierarchical Structure Preservation

Abstract:

Article Preview

This paper proposes an improved web crawler algorithm to climb more useful information since the basic web crawler algorithm is low-efficiency and easy to climb useless repeated information. By the proposed algorithm, the website urls are hierarchical saved to store websites overall topology, which will make crisscross complex web URL system from a graphic structure into a tree structure. The actual website BBS experiments show that the algorithm is much better than the basic web crawler algorithm in crawling speed and download information such as the usefulness of baking. Furthermore, it provides a performing structure mode for the increment crawler algorithm.

Info:

Periodical:

Key Engineering Materials (Volumes 474-476)

Edited by:

Garry Zhu

Pages:

2120-2124

DOI:

10.4028/www.scientific.net/KEM.474-476.2120

Citation:

Z. F. Hao et al., "An Improved Crawler Algorithm Based on Hierarchical Structure Preservation", Key Engineering Materials, Vols. 474-476, pp. 2120-2124, 2011

Online since:

April 2011

Export:

Price:

$35.00

In order to see related information, you need to Login.

In order to see related information, you need to Login.