Research of Information Retrieval Based on Web Page Segmentation

Article Preview

Abstract:

A Web information retrieval algorithm based on Web page segment is designed, the key idea of which is to segment each Web page into different topic areas or segments according to its HTML tags and contents since Web pages are semi-structure. First, the algorithm builds a HTML tag tree, and then it combines nodes in the tree under the rule of content similarity and visual similarity. During the process of retrieval and ranking, the algorithm makes full use of the segmentation information to sequence the relevant pages. The experimental results show that this method is able to improve the precision in search significantly and it is also a good reference for the design of the future search engines.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

4928-4931

Citation:

Online since:

October 2012

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Yangxin Yu. Information Query Model Based on OWL-S Matching. Computers and Applied Chemistry (In Chinese), 2007, 24(9): 1277-1280.

Google Scholar

[2] Zhengyu Zhu, Kunfeng Yuan, Xinghuan Chen. Method of Information Retrieval Based on Computing Maximum-weight-matching, 2007, 43(33): 176-179.

Google Scholar

[3] Park.J. S, Chen.M. S, Yu.P.S. An Effective Hashbased Algorithm for Mining Association Rules. In Proceedings of the ACM SIGMOD. International Conference on Management of Data, 1995: 175-186.

DOI: 10.1145/223784.223813

Google Scholar

[4] Yajun Liu, Yi Xu. Automatic Question Answering System Based on Weighted Semantic Similarity Model. Journal of Southeast University (In Chinese), 2004, 34(05): 609-612.

Google Scholar