Improved Nonnegative Matrix Factorization Based Feature Selection for High Dimensional Data Analysis

Article Preview

Abstract:

Feature selection has become the focus of research areas of applications with high dimensional data. Nonnegative matrix factorization (NMF) is a good method for dimensionality reduction but it cant select the optimal feature subset for its a feature extraction method. In this paper, a two-step strategy method based on improved NMF is proposed.The first step is to get the basis of each catagory in the dataset by NMF. Added constrains can guarantee these basises are sparse and mostly distinguish from each other which can contribute to classfication. An auxiliary function is used to prove the algorithm convergent.The classic ReliefF algorithm is used to weight each feature by all the basis vectors and choose the optimal feature subset in the second step.The experimental results revealed that the proposed method can select a representive and relevant feature subset which is effective in improving the performance of the classifier.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2344-2348

Citation:

Online since:

August 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Saidi Rabie, Aridhi Sabeur, Maddouri Mondher and Nguifo Engelbert Mephu, Feature extraction in protein sequences classification: A new stability measure, ACM Conference on Bioinformatics, Computational Biology and Biomedicine(BCB 2012), pp.683-689, (2012).

DOI: 10.1145/2382936.2383060

Google Scholar

[2] J. Lafferty and L. Wasserman, Challenges in statistical machine learning, Statistica Sinica, vol. 16, p.307–322, (2006).

Google Scholar

[3] D. Lee and H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol. 401, p.788–791, (1999).

DOI: 10.1038/44565

Google Scholar

[4] Sun Yijun, Todorovic Sinisa and Goodison Steve, Local Learning Based Feature Selection for High-Dimensional Data Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI 2010), vol. 32, pp.1610-1626, 2010.

DOI: 10.1109/tpami.2009.190

Google Scholar

[5] R. Kohavi and G. John, Wrappers for Feature Subset Selection, Artificial Intelligence, vol. 97, pp.273-324, Dec. (1997).

DOI: 10.1016/s0004-3702(97)00043-x

Google Scholar

[6] Igor Kononenko, Estimating Attributes: Analysis and Extensions of RELIEF, Peoceedings of European Conference on Machine Learning. Catania, Springer-Verlag, pp.171-182, (1994).

Google Scholar

[7] Wu Min, Li Jia, Liao Dingan and Lin Qing, Improved method based on NMF for face recognition, International Conference on Multimedia Technology(ICMT 2011), pp.559-562, (2011).

DOI: 10.1109/icmt.2011.6002166

Google Scholar

[8] Cai Deng, Wang Xuanhui, He Xiaofei, Probabilistic Dyadic Data Analysis with Local and Global Consistency, Proceedings of the 26th Annual International Conference on Machine Learning(ICML 09), vol. 382, (2009).

DOI: 10.1145/1553374.1553388

Google Scholar

[9] Lei Shang, A Feature Selection Method Based on Information Gain and Genetic Algorithm, International Conference on Computer Science and Electronics Engineering(ICCSEE 2012), vol. 2, pp.355-358, (2012).

DOI: 10.1109/iccsee.2012.97

Google Scholar

[10] C. Chang and C. Lin, LIBSVM: a library for support vector machines, 2001. Software available at http: /www. csie. ntu. edu. tw/~cjlin/libsvm.

Google Scholar