A Text Clustering Algorithms Based on Hidden Markov Model


Article Preview

Based on the probability model of clustering algorithm constructs a model for each cluster, calculate probability of every text falls in different models to decide text belongs to which cluster, conveniently in global Angle represents abstract structure of clusters. In this paper combining the hidden Markov model and k - means clustering algorithm realize text clustering, first produces first clustering results by k - means algorithm, as the initial probability model of a hidden Markov model ,constructed probability transfer matrix prediction every step of clustering iteration, when subtraction value of two probability transfer matrix is 0, clustering end. This algorithm can in global perspective every cluster of document clustering process, to avoid the repetition of clustering process, effectively improve the clustering algorithm .



Edited by:

Robin G. Qiu and Yongfeng Ju






W. Li and M. A. Li, "A Text Clustering Algorithms Based on Hidden Markov Model", Applied Mechanics and Materials, Vols. 135-136, pp. 1155-1158, 2012

Online since:

October 2011





[1] ZHANG Hua-ping , LIU Qun, CHENG Xue-qi, et al. Chinese lexical analysis using hierarchical hidden Markov model[C]/Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Morristown, NJ: Association for Computational Linguistics, 2003: 63-70.

DOI: 10.3115/1119250.1119259

[2] A. Panuccio ,M. Bicego ,V. Murino . A Hidden Markov Model-based approach to sequential data clustering. In: Structural, Syntactic and Statistical Pattern Recognition (SSPR02) , Springer , 2002 , 734-742.

DOI: 10.1007/3-540-70659-3_77

[3] Yuan Lei. based on probabilistic model for text clustering [D] Jilin University, 2005, 1-4.

[4] section Jiao Jiang, Xue Yongsheng, woods rain, Wang Wei, Shi Bole. A new Markov model based on hierarchical clustering algorithm for time series [J] Computer Research and Development, 2006, (01), 3 - 4.

[5] Bandyopadhyay S, Mauli k U. A n evolutionary technique based on K-Means algorithm for optional clustering in RN[J]. Information S ciences, 2002, 146: 221-237.

In order to see related information, you need to Login.