Redundant Feature Selection Methods in Text Classification

Su Fen Chen

doi:10.4028/www.scientific.net/AMR.1044-1045.1258

Paper Titles

Private Clouds Technique Based on Honeynet
p.1240

Program Design of Internet-Based Remote Video Monitoring System of Coal Mining Enterprises
p.1243

Real-Time Human Detection Based on the Improved Local Binary Pattern
p.1246

Recurrence Plot Analysis of HRV for Exposure to Low-Frequency Noise
p.1251

Redundant Feature Selection Methods in Text Classification
p.1258

Research and Design of Mobile Payment Platform Based on Hybrid APP Technology
p.1262

Research of from Theory to Practice about Computational Thinking
p.1266

Research of Multidimensional Dynamic Web Report
p.1270

Research of Reconfigurable Measurement Method Based on Mobile Terminal
p.1274

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 1044-1045Redundant Feature Selection Methods in Text...

Redundant Feature Selection Methods in Text Classification

Abstract:

Feature selection is an effective pre-processing technology to facilitate text mining on high dimensional feature space. In recent years, many effective redundant feature selection methods have been proposed from different motivations. However, a comparative experimental study on redundant feature selection methods in the field of text mining has not been reported yet. In order to solve this problem, an extensive empirical comparative study with the task of text classification is given in the paper. The experimental results indicate that the 3-way Mutual Information represents the redundancy much better than traditional 2-way Mutual Information, since the label information are considered by 3-way Mutual Information. As a result, the performances of redundant feature selection methods based on 3-way Mutual Information overwhelm other methods.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 1044-1045)

Pages:

1258-1261

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.1044-1045.1258

Citation:

Cite this paper

Online since:

October 2014

Authors:

Su Fen Chen*

Keywords:

Dimension Reduction, Feature Selection, Redundant Feature, Text Classification

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003) 1157–1182.

Google Scholar

[2] X. -Q. Zeng, G. -Z. Li, J. Y. Yang, M. Q. Yang, G. -F. Wu, Dimension Reduction with Redundant Gene Elimination for Tumor Classification, BMC Bioinfo. 9(Suppl 6) (2008) S8.

DOI: 10.1186/1471-2105-9-s6-s8

Google Scholar

[3] R. May, G. Dandy, H. Maier, Review of Input Variable Selection Methods for Artificial Neural Networks, Artif. Neural Networks—methodological Adv. Biomed. Appl. (2011) 19–44.

DOI: 10.5772/16004

Google Scholar

[4] G. Forman, An Extensive Empirical Study of Feature Selection Metrics for Text Classification, J. Mach. Learn. Res. 3 (2003) 1289–1305.

Google Scholar

[5] M. A. Hall, Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning, in Proceedings of the 2000 International Conference on Machine Learning (ICML'00), 2000, p.359–366.

Google Scholar

[6] Z. Zhao, L. Wang, H. Liu, J. Ye, On Similarity Preserving Feature Selection, IEEE Trans. Knowl. Data Eng. 25(3) (2013) 619–632.

DOI: 10.1109/tkde.2011.222

Google Scholar

[7] R. Battiti, Using Mutual Information for Selecting Features in Supervised Neural Net Learning, IEEE Trans. Neural Netw. 5(4) (1994) 537–550.

DOI: 10.1109/72.298224

Google Scholar

[8] C. Ding, H. Peng, Minimum Redundancy Feature Selection from Microarray Gene Expression Data, in Proceedings of the 2003 IEEE Bioinformatics Conference, 2003, p.523–528.

DOI: 10.1109/csb.2003.1227396

Google Scholar

[9] N. Kwak, C. -H. Choi, Input Feature Selection for Classification Problems, IEEE Trans. Neural Netw. 13(1) 2002 143–159.

DOI: 10.1109/72.977291

Google Scholar

[10] G. Bontempi, P. Meyer, Causal Filter Selection in Microarray Data, in Proceedings of the 2010 International Conference on Machine Learning (ICML'10), 2010, p.95–102.

Google Scholar

[11] F. Fleuret, Fast Binary Feature Selection with Conditional Mutual Information, J. Mach. Learn. Res. 5 (2004) 1531–1555.

Google Scholar

[12] Y. Yang, X. Liu, A Re-examination of Text Categorization Methods, in the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999, p.42–49.

DOI: 10.1145/312624.312647

Google Scholar