Near-Replicas of Web Pages Eliminating Repetitive Algorithms Based on MD5

Article Preview

Abstract:

The development of the internet and exponential growth of network information produce a large number of duplicated pages on the network, reducing the retrieval of recall and precision and affecting the retrieval efficiency. The accuracy of the web, therefore, influences the quality of search engine. On the basis of the structural text description, this paper proposes an improved eliminating repetitive algorithm method, which is based on MD5 of Near-replicas. It proves that the method has a good effect on improving the recall and the precision through experiment.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1752-1756

Citation:

Online since:

June 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Lixia Wei, Jiaheng Zhang: Detection and elimination of similar Web pages based on text structure, Computer Application, 2007(11): 2854-2856.

Google Scholar

[2] Yajie Yan: Study on Method on Deletion of Duplicated Web Pages, Computer Development & Applications, 2008(8): 60-62.

Google Scholar

[3] Chuandong Cao, Li Guo: Detection and elimination of similar Web pages based on text structure, Science &Technology Information, 2009(1): 102-103.

Google Scholar

[4] Suzhi Zhang, Deqiang Fan: Study on duplicated removal method Web pages and algorithm design, Journal of Zhengzhou University of Light Industry(Natural Science), 2010(2): 63-66.

Google Scholar

[5] Yong Fan, Jiaheng Zheng: Study on Method on Deletion of Duplicated Web Pages. Computer Engineering and Application, 2009, 45(12): 141-143.

Google Scholar

[6] ZuxiWang: Research of Detection and elimination of Web Pages Based on Information, Journal of Jiamusi University(Natural Science Edition), 2010(1): 22-24.

Google Scholar

[7] Ye Liang, Jingzhang Liang, Hong Yang and Yun Ye: Study on near_replicas detection algorithm in duplicated text removal, Journal of Guangxi University: Nat Sci Ed, 2010(2): 320-323.

Google Scholar

[8] Jianyong Wang, Zhengmao Xie, Ming Lei and Xiaoming Li: Research and Evaluation of Near-replicas of Web Pages Detection Algorithms, Acta Electronica Sinica, 2000(11): 1-4.

Google Scholar

[9] Wenzhong Yang, Jing Zhang: Using Message-Digest Algorithm for Improving Efficiency of Web Information Searching, Computer Technology And Development, 2006(6): 222-226.

Google Scholar