An Improved Text Similarity Calculation Algorithm Based on VSM
Text similarity calculation is a key technology in the fields of text clustering, Web intelligent retrieval and natural language processing etc. Because the traditional text similarity calculation algorithm does not consider the affect of same feature words between texts, sometimes this algorithm may lead to inaccurate results. To solve this problem, this paper gives an improved text similarity calculation algorithm. Considering that the amount of same feature words reflects two texts’ similarity in some extent, the improved algorithm adds in the coverage measured parameter, which effectively reduces the interference of texts with lower similarity. The simulation and experimental results verify the improved algorithm’s correctness and effectiveness.
Helen Zhang, Gang Shen and David Jin
L. Li et al., "An Improved Text Similarity Calculation Algorithm Based on VSM", Advanced Materials Research, Vols. 225-226, pp. 1105-1108, 2011