A Combined Method for Chinese Micro-Blogging Topic Tracking

Article Preview

Abstract:

To the problem of Chinese micro-blogging topic tracking, a method combined LDA model and Bagging of ensemble learning was proposed. The method firstly used the LDA hidden topic modeling, effectively solved the issue that the dataset’s sparsity of the short text, then made the C4.5 decision tree as a weak classifier, through examples resampling to obtain multiple training set, compounding the training sets according to the voting rule, and ultimately getting the similarity of the micro-blogging topic. Experiments show that, compared with the model based on single vector model, classical TF-IDF and the tracking method of C.45Bagging similarity computing, this method have a better performance on precision, recall ratio and F1 value.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2816-2820

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Http: /baike. baidu. com~iew/1259292.

Google Scholar

[2] Zhang Xiaoyan, Wang Ting, Liang Xiaobo. Use of LDA Model in Topic Tracking [J]. Computer Science, 2011, 38(10): 136-139.

Google Scholar

[3] Blei D M, Ng A Y. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.

Google Scholar

[4] Zhang Xiang, Zhou Mingquan, Geng Guohua. Research on Improvement of Bagging Chinese Text Categorization Classifier [J]. Journal of Chinese Computer Systems, 2010, 31(2): 281-284.

Google Scholar

[5] Shen Xuehua, Zhou Zhihua, Wu Jianxin, Chen Zhaoqian. Survey of Boosting and Bagging[J]. Computer Engineering and Applications , 2000, 36(12): 31-32.

Google Scholar

[6] Thomas L. Griffiths, Mark Steyvers. Finding scientific topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(suppl 1): 5228-5235.

DOI: 10.1073/pnas.0307752101

Google Scholar

[7] Lu Rong, Xiang Liang, Liu Mingrong, Yang Qing. Discovering News Topics from Micro-blogs Based on Hidden Topics Analysis and Text Clustering [J]. Pattern Recognition and Artificial Intelligence, 2012, 03: 382-387.

Google Scholar

[8] Zhang Xiang, Zhou Ming-quan, Geng Guo-hua. C4. 5 Bagging algorithm for Chinese text categorization. Computer Engineering and Applications, 2009, 45(26): 135-137.

Google Scholar