Design of the Distributed Web Crawler

Xing Chen; Wei Jiang Li; Tie Jun Zhao; Xing Hai Piao

doi:10.4028/www.scientific.net/AMR.204-210.1454

Paper Titles

Study of Substantive Characteristics on Integrated and Intelligent Transportation System
p.1437

The Topology Optimization Methods on Quasi-Static Structural
p.1441

Measuring Method for Economic Benefit of Industrial Enterprises Based on Panel Data
p.1446

The Optimal Design of Logistics Distribution Network with Pre-Sale Period of High-Value and Time-Varying Product under E-Commerce
p.1450

Design of the Distributed Web Crawler
p.1454

Ecological Administration Practice in China: Case of Tuopai Group from Sichuan
p.1459

Software Design and Control Strategy for Polymerase Chain Reaction System
p.1463

Comparative Analysis on Offshore Water Quality Status: A Case Study of Haizhou Bay, China
p.1467

Top-Down Algorithm for Mining Maximal Frequent Subgraph
p.1472

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 204-210Design of the Distributed Web Crawler

Design of the Distributed Web Crawler

Abstract:

On the current scale of the Internet, the single web crawler is unable to visit the entire web in an effective time-frame. So, we develop a distributed web crawler system to deal with it. In our distribution design, we mainly consider two facets of parallel. One is the multi-thread in the internal nodes; the other is distributed parallel among the nodes. We focus on the distribution and parallel between nodes. We address two issues of the distributed web crawler which include the crawl strategy and dynamic configuration. The results of experiment show that the hash function based on the web site achieves the goal of the distributed web crawler. At the same time, we pursue the load balance of the system, we also should reduce the communication and management spending as much as possible.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 204-210)

Pages:

1454-1458

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.204-210.1454

Citation:

Cite this paper

Online since:

February 2011

Authors:

Xing Chen, Wei Jiang Li, Tie Jun Zhao, Xing Hai Piao

Keywords:

Distribution, Multi-Thread, Web Crawler

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] D. R. Hardy, M. F. S. D. Wessels. Harvest User's Manual. University of Colorado, Boulder, (1996).

Google Scholar

[2] H.F. Shen, Search Engines and Their Function—Optimized Model. Science of information. Vol. 18(2000), p.7–9.

Google Scholar

[3] G. Pant, F. Menczer. Myspiders: Evolve your own intelligent web crawlers. Autonomous Agents and Multi-Agent Systems. Vol. 5(2002), p.221–229.

DOI: 10.1023/a:1014853428272

Google Scholar

[4] X.H. Yang, Distributed collecting technology of WWW information. Computer Engineering and Applications. Vol. 36(2000), p.145–146.

Google Scholar

[5] H. Song Z.Y. Song,L. Zhang, et al. Analysis and Design of URL Indexing in Distributed Information Retrieval System. Journal of Shanghai Jiaotong University Vol. 37(2003), pp.454-457.

Google Scholar