A Improved Topics Search Algorithm Based on PSO Strategy for Web Mining

Article Preview

Abstract:

HITS algorithm assigns same weight to links between Web pages,which results in topic drift. In this paper,a new focused crawling approach based on PSO Algorithm is proposed(PSOHITS). The method electively seeks out pages that are relevant to a pre-defined set of topics using PSO Algorithm,increases the crawling chance of the web page following the web page with the low content-relevance,and broadens the relevant-searching scope of crawlers.Meanwhile,the hyperlink metadata is used to predict the topic-relevance of the web page pointed and quickens the information crawling. Experiments show that the proposed algorithm can improve relevance ratio by 15%~36%.Furthermore,it can well avoid topic drift and improve the accuracy of information collection. It has important theoretical and practical values for search engines research.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 439-440)

Pages:

1481-1486

Citation:

Online since:

June 2010

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] WANG Jianyong, SHAN Songwei, LEI Ming. Web search engine: characteristics of user behaviors and their implication [J]. SCIENCE IN CHINA (Series F), 2001, 44(5): 351-365.

DOI: 10.1007/bf02714738

Google Scholar

[2] Kleinberg J. Authoritative sources in a hyperlinked environment [J]. Journal of the ACM, 1998, 46(5): 604-632.

DOI: 10.1145/324133.324140

Google Scholar

[3] Devanshu Dhyani, Sourav S. Deriving and verifying statistical distribution of a hyperlink-based Web page quality metric[J]. DATA KNOWLEDGE ENGINEERING, 46(2003): 291-315.

DOI: 10.1016/s0169-023x(03)00034-x

Google Scholar

[4] JI Yi-mu; WANG Ru-chuan. Study on PSO algorithm in solving grid task scheduling [J], Journal on Communications, 2007, 28(10): 0060-0066.

Google Scholar

[5] YI Kan; WANG Ru-chuan. Nash Equilibrium Based Task Scheduling Algorithm of Multi-schedulers in Grid Computing [J], Acta Electronica Sinica, 2009, 37( 2): 0329-0333.

Google Scholar