Design of the Distributed Web Crawler

Abstract:

Article Preview

On the current scale of the Internet, the single web crawler is unable to visit the entire web in an effective time-frame. So, we develop a distributed web crawler system to deal with it. In our distribution design, we mainly consider two facets of parallel. One is the multi-thread in the internal nodes; the other is distributed parallel among the nodes. We focus on the distribution and parallel between nodes. We address two issues of the distributed web crawler which include the crawl strategy and dynamic configuration. The results of experiment show that the hash function based on the web site achieves the goal of the distributed web crawler. At the same time, we pursue the load balance of the system, we also should reduce the communication and management spending as much as possible.

Info:

Periodical:

Advanced Materials Research (Volumes 204-210)

Edited by:

Helen Zhang, Gang Shen and David Jin

Pages:

1454-1458

DOI:

10.4028/www.scientific.net/AMR.204-210.1454

Citation:

X. Chen et al., "Design of the Distributed Web Crawler", Advanced Materials Research, Vols. 204-210, pp. 1454-1458, 2011

Online since:

February 2011

Export:

Price:

$35.00

In order to see related information, you need to Login.