p.2837
p.2841
p.2845
p.2850
p.2854
p.2858
p.2862
p.2866
p.2871
A Comparative Study on Feature Selection in Chinese Text Classification Problem
Abstract:
Information explosion brings lots of challenges to text classification. The dimension disaster led to a sharp increase of computational complexity and lower classification accuracy. Therefore, it is critical to use feature selection techniques before actual classification. Automatic classification of English text has been researched for many years, but little on Chinese text. In this paper, several classic feature selection methods, namely TF, IG and CHI, are compared on classifying Chinese text. Meanwhile, we take imbalanced data into consideration in the paper. Experimental results show that CHI performed better than IG and TF when the dataset is imbalanced, but no obvious difference on balanced data.
Info:
Periodical:
Pages:
2854-2857
Citation:
Online since:
August 2013
Authors:
Keywords:
Price:
Сopyright:
© 2013 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: