Efficient Feature Selection Model for Gene Expression Data

Article Preview

Abstract:

Finding subset of informative gene is very crucial for biology process because several genes increase sharply and most of them are not related with others. In general, feature selection technique consists of two steps 1) all genes is ranked by a filter approach 2) rank list is sent to a wrapper approach. Nevertheless, the accuracy rate for recognition gene is not enough. Therefore, this paper proposes efficient feature selection model for gene expression data. First, two filter approaches are used to define many subset of attribute such as Correlation based Feature Selection (Cfs) and Gain Ratio (GR). Second, wrapper approach is used to evaluate each length of attribute that based on Support Vector Machine (SVM) and Random Forest (RF). The result of experiment depicts CfsSVM, CfsRF, GRSVM, and GRRF based on proposed model produce higher accuracy rate such as 87.10%, 90.32%, 87.10, and 88.71%, respectively.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1948-1952

Citation:

Online since:

October 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] P. Lance, H. Ehtesham, and L. Huan, Subspace Clustering for High Dimensional Data: A Review, SIGKDD Explor. Newsl. 1931-0145, Vol. 6 (2004), pp.90-105.

DOI: 10.1145/1007730.1007731

Google Scholar

[2] Mukherjee, S. and S. J. Roberts. A Theoretical Analysis of Gene Selection, Computational Systems Bioinformatics Conference, CSB 2004. Proceedings (2004), pp.131-141.

DOI: 10.1109/csb.2004.1332425

Google Scholar

[3] P. Saengsiri, S.N. Wichian, P. Meesad, and U. Herwig, Comparison of hybrid feature selection models on gene expression data, in 8th International Conference on ICT and Knowledge Engineering (2010), pp.13-18.

DOI: 10.1109/ictke.2010.5692905

Google Scholar

[4] Pádraig Cunningham, Dimension Reduction, Technical Report UCD-CSI-2007-7, August , (2007), pp.1-4.

Google Scholar

[5] Jaeger J., R. Sengupta , W. L. Ruzzo, Improved Gene feature selection for Classification of Microarrays, Pacific Symposium on Biocomputing 8 (2003), pp.53-64.

DOI: 10.1142/9789812776303_0006

Google Scholar

[6] Cheng-San, Y., C. Li-Yeh, et al, A Hybrid Approach for Selecting Gene Subsets Using Gene Expression Data, " Soft Computing in Industrial Applications, SMCia , 08. IEEE Conference (2008), pp.159-164.

DOI: 10.1109/smcia.2008.5045953

Google Scholar

[7] Hikaru Mitsubayashi, Seiichiro Aso, Tomomasa Nagashima, and Yoshifumi Okada, Accurate and Robust Gene feature selectionfor Disease Classification Using a Simple Statistic, ΙSSN 0973-2063 (online) 0973-2063 (print), Bioinformation 3(2) (2008).

DOI: 10.6026/97320630003068

Google Scholar

[8] Jin-Hyuk H. and C. Sung-Bae, Cancer classification incremental gene feature selectionbased on DNA microarray data, Computational Intelligence in Bioinformatics and Computational Biology, IEEE Symposium (2008), pp.70-74.

DOI: 10.1109/cibcb.2008.4675761

Google Scholar

[9] Kamal A., X. Zhu, A. Pandya, S. Hsu, and M. hoaib, The Impact of Gene feature selectionon Imbalanced Microarray Expression Data, Bioinformatics and Computational Biology (2009), pp.259-269.

DOI: 10.1007/978-3-642-00727-9_25

Google Scholar

[10] Mark A. Hall, Correlation-based Feature Selection for Machine Learning, Doctor of Philosphy Department of Computer Science, The University of Waikato Newzealand (1999).

Google Scholar

[11] R. Gray, Entropy and Information Theory, Springer (1990), pp.12-18.

Google Scholar