Redundant Feature Selection Methods in Text Classification

Article Preview

Abstract:

Feature selection is an effective pre-processing technology to facilitate text mining on high dimensional feature space. In recent years, many effective redundant feature selection methods have been proposed from different motivations. However, a comparative experimental study on redundant feature selection methods in the field of text mining has not been reported yet. In order to solve this problem, an extensive empirical comparative study with the task of text classification is given in the paper. The experimental results indicate that the 3-way Mutual Information represents the redundancy much better than traditional 2-way Mutual Information, since the label information are considered by 3-way Mutual Information. As a result, the performances of redundant feature selection methods based on 3-way Mutual Information overwhelm other methods.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 1044-1045)

Pages:

1258-1261

Citation:

Online since:

October 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003) 1157–1182.

Google Scholar

[2] X. -Q. Zeng, G. -Z. Li, J. Y. Yang, M. Q. Yang, G. -F. Wu, Dimension Reduction with Redundant Gene Elimination for Tumor Classification, BMC Bioinfo. 9(Suppl 6) (2008) S8.

DOI: 10.1186/1471-2105-9-s6-s8

Google Scholar

[3] R. May, G. Dandy, H. Maier, Review of Input Variable Selection Methods for Artificial Neural Networks, Artif. Neural Networks—methodological Adv. Biomed. Appl. (2011) 19–44.

DOI: 10.5772/16004

Google Scholar

[4] G. Forman, An Extensive Empirical Study of Feature Selection Metrics for Text Classification, J. Mach. Learn. Res. 3 (2003) 1289–1305.

Google Scholar

[5] M. A. Hall, Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning, in Proceedings of the 2000 International Conference on Machine Learning (ICML'00), 2000, p.359–366.

Google Scholar

[6] Z. Zhao, L. Wang, H. Liu, J. Ye, On Similarity Preserving Feature Selection, IEEE Trans. Knowl. Data Eng. 25(3) (2013) 619–632.

DOI: 10.1109/tkde.2011.222

Google Scholar

[7] R. Battiti, Using Mutual Information for Selecting Features in Supervised Neural Net Learning, IEEE Trans. Neural Netw. 5(4) (1994) 537–550.

DOI: 10.1109/72.298224

Google Scholar

[8] C. Ding, H. Peng, Minimum Redundancy Feature Selection from Microarray Gene Expression Data, in Proceedings of the 2003 IEEE Bioinformatics Conference, 2003, p.523–528.

DOI: 10.1109/csb.2003.1227396

Google Scholar

[9] N. Kwak, C. -H. Choi, Input Feature Selection for Classification Problems, IEEE Trans. Neural Netw. 13(1) 2002 143–159.

DOI: 10.1109/72.977291

Google Scholar

[10] G. Bontempi, P. Meyer, Causal Filter Selection in Microarray Data, in Proceedings of the 2010 International Conference on Machine Learning (ICML'10), 2010, p.95–102.

Google Scholar

[11] F. Fleuret, Fast Binary Feature Selection with Conditional Mutual Information, J. Mach. Learn. Res. 5 (2004) 1531–1555.

Google Scholar

[12] Y. Yang, X. Liu, A Re-examination of Text Categorization Methods, in the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999, p.42–49.

DOI: 10.1145/312624.312647

Google Scholar