Comparison and Improvements of Feature Extraction Methods for Text Categorization

Article Preview

Abstract:

Feature extraction is a key point of text categorization[1]. The accuracy of extraction will directly affect the accuracy of text classification. This paper introduces and compares 4 commonly used methods of text feature extraction: IG (Information gain), MI (Mutual information), CHI (statistics), DF (Document frequency), and proposes an improved method based on the method of CHI. Experiment result shows that the proposed method can improve the accuracy of text categorization.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1824-1828

Citation:

Online since:

August 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Zhou Shui-sheng, Liu-Hong-wei, Ye Feng, Variant of Gaussian Kernel and Parameter Setting Method for Nonlinear SVM[C], Third International Conference On Natural Computation, Haikou, China, (2007).

Google Scholar

[2] Huifeng Tan, Songbo Tan, Xueqi Chen. A surnery on sentiment detection of reviews. Expert Systems With Applicatins:2009, 36: 10760-10773.

DOI: 10.1016/j.eswa.2009.02.063

Google Scholar

[3] GALAVOTTI Luigi, SEBASTIANI Feature selection and negative evidence in automated text categorization[C]/Proceeding of ACM KDD-00 Workshop Text Mining. New York, US:ACM Press, 2000:40-42.

Google Scholar

[4] TSOUBKY , YUENRWM, kwongoy, et al. Polarity classification of celebrity converage in the Chinese press [C]/Proceeding of the 2005 International Conference on Intelligence Analysis. Virginia, USA: [s. n. ], (2005).

Google Scholar

[5] M Utiyarna and H Isahara. Large-scale text categorization (in Japanese) [C]. 9th Annual Meeting of the A ssociation (Japan) for Natural Language Processing, 2003, 385-388.

Google Scholar