An Improved Method of Short Text Feature Extraction Based on Words Co-Occurrence

Article Preview

Abstract:

In Chinese text clustering, short text is very different from traditional long text, principally in the low frequency of words. As a result, traditional text feature extraction and the method for weight calculating is not directly suitable for short text clustering .To solve the problem of clustering drift in short text segments ,this paper proposes an method for feature extraction through improving the method of weight calculating based on words co-occurrence. Experiments show the method can get better performance in Chinese short-text clustering compared with the traditional method TF-IDF.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

842-845

Citation:

Online since:

February 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Sunmaosong, Zoujiayan. The review of Automatic Chinese word segmentation. Contemporary linguistics [J], 2001, 01: 22-32.

Google Scholar

[2] Jin chunxia, Zhou haiyan. Chinese short text clustering based on dynamic vector. Computer Engineering and Applications[J],2011,47(33):156-158.

Google Scholar

[3] Wan xiaojun. A novel document similarity measure based on earth mover's distance[J]. Information Science, 2007, 177: 3718-3730.

DOI: 10.1016/j.ins.2007.02.045

Google Scholar

[4] Zhang zhushan, Ye yunming. The research of short text feature extraction forBBS. [EB/OL]., http: /wenku. baidu. com/view/dda5c687bceb19e8b8f6ba70. html.

Google Scholar

[5] Tang hanqing, Wang hanjun. Application of Improved K-Means Algorithm to Analysis of Online Public Opinions[J]. Computer system&Application, 2011. vol(20): 167-169.

Google Scholar