p.1875
p.1882
p.1888
p.1893
p.1896
p.1901
p.1905
p.1909
p.1913
Research and Implementation of Topic Crawler Based on Hadoop
Abstract:
This article proposes distributed topic-focused crawlers on the basis of HDFS and MapReduce, which are the two core technologies of Hadoop. The crawler has the advantages of distributed processing ability, scalability, high reliability. This article analyses the dependency of subject by using the method of conceptual analysis and uses MapReduce to realize webpage crawling and updating webpage. At last using experiments to verify the performance, expansibility and reliability of the system
Info:
Periodical:
Pages:
1896-1900
Citation:
Online since:
September 2014
Authors:
Keywords:
Price:
Сopyright:
© 2014 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: