Feature Selection Algorithm for Hyperlipidemia Classification

Article Preview

Abstract:

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averaging F1 measure is used. DF is suitable for the task of large text classification.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

110-113

Citation:

Online since:

December 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2015 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Elias A. Iliadis and Robert S. Rosenson. Long-Term safety of pravastatin-gemfibrozil therapy in mixed hyperlipidemia. Clinical Cardiology, Vol. 22(2), (2009), pp.25-28.

DOI: 10.1002/clc.4960220110

Google Scholar

[2] Wenhua Zhao, Jian Zhang, Yue You, and etc. Epidemiologic characteristics of dyslipidemia in people aged 18 years and over in China. Chinese Journal of Preventive Medicine, Vol. 39(5), (2005), pp.306-310. In Chinese.

Google Scholar

[3] Qirui Zhang, Man Luo, Hexian Wang and Jinghua Tan. A Hyperlipidemia Information Analysis System Based on Immune Algorithm. Proceedings of 2010 International Conference on Computer Application and System Modeling, (2010), pp.421-424.

DOI: 10.1109/iccasm.2010.5620593

Google Scholar

[4] Fabrizio Sebastiani. Machine learning in automatic text categorization. ACM Computing Surverys, Vol. 34(3), (2002), pp.1-47.

Google Scholar

[5] Yiming Yang, and Xin Liu. A re-examination of text categorization methods. SIGIR Forum (ACM Special Internet Group on Information Retrieval), (1999), pp.42-49.

Google Scholar

[6] Yiming Yang, and O.P. Jan. A comparative study on feature selection in text categorization. Proceeding of ICML-97, 14th International Conference on Machine Learning, (1997), pp.412-420.

Google Scholar

[7] Kandarp Dave. Study of feature selection algorithms for text categorization. University of Nevada, Las Vegas, (2011).

Google Scholar

[8] Stefano Baccianella, Andrea Esuli and Fabrizio Sebastiani. Feature Selection for Ordinal Text Classification. Neural Computation, Vol. 26(3), (2014), pp.557-591.

DOI: 10.1162/neco_a_00558

Google Scholar

[9] Salton. G., Wong. A., and Yang. C.S. A vector space model for automatic indexing. Communications of the ACM, Vol. 18(11), (1975), pp.613-620.

DOI: 10.1145/361219.361220

Google Scholar

[10] Pallabi Borah, Hasin A. Ahmed and Dhruba K. Bhattacharyya. A statistical feature selection technique. Network Modeling Analysis in Health Informatics and Bioinformatics, Vol. 55(3), (2014), pp.1-13.

DOI: 10.1007/s13721-014-0055-0

Google Scholar

[11] Qirui Zhang, Ling Zhang, Shoubin Dong and Jinghua Tan. Document indexing in text categorization. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, (2005), pp.3792-3796.

DOI: 10.1109/icmlc.2005.1527600

Google Scholar