Text Classification Combined an Improved CHI and Category Relevance Factor

Article Preview

Abstract:

Text classification is the task of assigning natural language textual documents to predefined categories based on their context. The main concern in this paper is to improve the accuracy of text classification system combined an improved CHI method and category relevance factor. Firstly, use an improved CHI method to select features from the raw features aim to reduce the dimensions of the features. Secondly, through the TF-CRF method to calculate the feature weight, this method mainly consider that the features have different distributions in different categories. Finally, we carried out a series of experiments compared with other methods using the F1-measure. Experimental results show that our new method makes an important improvement in all categories.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 524-527)

Pages:

3866-3869

Citation:

Online since:

May 2012

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Wen Zhang, Taketoshi Yoshida, Xijin Tang, Text classification based on multi-word with support vector machine, Knowledge-Based Systems, 2008, pp.879-886.

DOI: 10.1016/j.knosys.2008.03.044

Google Scholar

[2] Y.Hao et al.(Eds.): CIS 2005, Part I, LNAI 3801, pp.458-463,2005.

Google Scholar

[3] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, R.A Harshman, Indexing by latent semantic analysis, Journal of the American Society of Information Science 41 (1990) :391-407.

DOI: 10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9

Google Scholar

[4] LU Ting, WANG Hao, YAO Hongliang. K-nearest neighbor chinese text categorization algorithm based on center documents. Computer Engineering and Applications, 2011, 47(2):127-130.

Google Scholar

[5] Dhillon I, Kogan J, Nicholas C. Feature selection and document clustering [C]//Proceedings 2002 CAD IP Research Symposium 2002:70-130

Google Scholar

[6] Y. Yang, J. Pedersen. A comparative study on feature selection in text categorization, in: Proceedings of the 14th International Conference on Machine Learning, Nashville, USA, 1997, pp: 412-420

Google Scholar

[7] PEI Yingbo, LIU Xiaoxia. Study on improved CHI for feature selection in Chinese text categorization. Computer Engineering and Applications, 2011, 47(4):128-130. (In Chinese)

Google Scholar

[8] Jiana Meng, Hongfei Lin, Yuhai Yu. A two-stage feature selection method for text categorization. Computers and Mathematics with Applications, 2011.07(45):2793-2800

DOI: 10.1016/j.camwa.2011.07.045

Google Scholar

[9] G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research. 2003(3):1289-1305

Google Scholar

[10] Chang C C, Lin C J.LIBSVM: a library for support vector machines [EB/OL]. [2010-04-11].http://www.csie.ntu.edu.tw/cjlin/libsvm/

Google Scholar