An Improved Text Similarity Calculation Algorithm Based on VSM

Article Preview

Abstract:

Text similarity calculation is a key technology in the fields of text clustering, Web intelligent retrieval and natural language processing etc. Because the traditional text similarity calculation algorithm does not consider the affect of same feature words between texts, sometimes this algorithm may lead to inaccurate results. To solve this problem, this paper gives an improved text similarity calculation algorithm. Considering that the amount of same feature words reflects two texts’ similarity in some extent, the improved algorithm adds in the coverage measured parameter, which effectively reduces the interference of texts with lower similarity. The simulation and experimental results verify the improved algorithm’s correctness and effectiveness.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 225-226)

Pages:

1105-1108

Citation:

Online since:

April 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] ZHANG Xia, WANG Jiandong and GU Haihua: Improvement of similarity measure method. Computer Engineering and Applications, 46(19): 141-144(2010).

Google Scholar

[2] Yue Xiaoguang . etc: Design and implementation of Chinese word segmentation system Based. NET. Control & Automation, 26 (4-3) : 214-216(2010).

Google Scholar

[3] Zhang Huaping: Institute of Computing Technology, Chinese Lexical Analysis System (ICTCLAS). http: /www. nlp. org. cn/project/project. php?proj_id=6.

Google Scholar

[4] Li Zhongyuan, Yang Shouwen: Improvement of Weight of Web Page Features in Calculation Based on VSM. Computer and Modernization, 178: 134-140(2010).

Google Scholar