Short Text Clustering Algorithm with Feature Keyword Expansion

Article Preview

Abstract:

To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1716-1720

Citation:

Online since:

June 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Carullo M, Binaghi E, Gallo I. An online document clustering technique for short web contents. Pattern Recognit Lett , 2009, 30(10), p.870–876.

DOI: 10.1016/j.patrec.2009.04.001

Google Scholar

[2] Pinto D, Bened JM, Rosso P. Clustering narrow-domain short texts by using the Kullback-Leibler distance. In: Gelbukh A. (ed. ) CICLing 2007, LNCS, vol. 4394, p.611–622.

DOI: 10.1007/978-3-540-70939-8_54

Google Scholar

[3] Liu Qun , Li SuJian. Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing , 2002, 7 (2), pp.59-76.

Google Scholar

[4] Lin Li. Text clustering reseach based on semantic distance. Xiamen University Master thesis, 2007(4).

Google Scholar

[5] Wan Xiaojun. A novel document similarity measure based on earth mover's distance. Information Science, 2007, pp.3718-3730.

DOI: 10.1016/j.ins.2007.02.045

Google Scholar

[6] Cagnina L, Errecalde M, Ingaramo D, Rosso P. A discrete particle swarm optimizer for clustering short-text corpora. In: BIOMA 2008, p.93–103.

Google Scholar

[7] Y. Wang,Y. Cheung, and H. Liu. An efficient algorithm for clustering search engine results. Springer-Verlag Berlin Heidelberg. CIS 2006, LNAI 4456, p.661–671.

Google Scholar