Paper Title:
An Improved Crawler Algorithm Based on Hierarchical Structure Preservation
  Abstract

This paper proposes an improved web crawler algorithm to climb more useful information since the basic web crawler algorithm is low-efficiency and easy to climb useless repeated information. By the proposed algorithm, the website urls are hierarchical saved to store websites overall topology, which will make crisscross complex web URL system from a graphic structure into a tree structure. The actual website BBS experiments show that the algorithm is much better than the basic web crawler algorithm in crawling speed and download information such as the usefulness of baking. Furthermore, it provides a performing structure mode for the increment crawler algorithm.

  Info
Periodical
Key Engineering Materials (Volumes 474-476)
Edited by
Garry Zhu
Pages
2120-2124
DOI
10.4028/www.scientific.net/KEM.474-476.2120
Citation
Z. F. Hao, Z. B. Zhang, Z. Q. Cai, H. Huang, "An Improved Crawler Algorithm Based on Hierarchical Structure Preservation", Key Engineering Materials, Vols. 474-476, pp. 2120-2124, 2011
Online since
April 2011
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Fang Jie Yu, Xin Luan, Da Lei Song, Xiu Fang Li, Hong Hong Zhou
Chapter 7: Other Measurement Methods and Its Application
Abstract:This paper presents a novel sub-pixel corner detection algorithm for camera calibration. In order to achieve high accuracy and robust...
713
Authors: Pichitpong Soontornpipit
Chapter 1: Communication and Applied Information Technologies
Abstract:This research designs an artificial intelligent (AI) system for monitoring and collecting tobacco sales on website in Thailand. The web...
129