Performance Improvement in the Pattern Classification of Nominal Data Sets Applying Multiple Correspondence Analysis

Article Preview

Abstract:

Classification is a supervised learning problem used to discriminate data instances in different classes. The solution to this problem is obtained through algorithms (classifiers) that look for patterns of relationships between classes in known cases, using these relationships to classify unknown cases. The performance of the classifiers depends substantially of the data types. In order to give proper treatment to nominal data, this paper shows that the application of previous transformations can substantially improve the performance of classifiers, bringing significant benefits to the result of the whole process of Knowledge Discovery in Databases (KDD). This paper uses three different data sets with nominal data and two well-known classifiers: the Linear Discriminant Analysis (LDA), and the Naïve-Bayes (NB). For data transformation, the paper applies an approach called Geometric Data Analysis (GDA). The GDA techniques compared in this paper are the traditional Principal Component Analysis (PCA) and the underexplored Multiple Correspondence Analysis (MCA). The results confirm the capability of the GDA transformation to improve the classification accuracy and attest the superiority of the MCA in comparison with its precursor, the PCA, when applied to nominal data.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1482-1487

Citation:

Online since:

October 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. Advances in knowledge discovery e data mining. Association for the Advancement of Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA (1996).

Google Scholar

[2] M.T. A Steiner. Uma metodologia para o reconhecimento de padrões multivariados com resposta dicotômica. (PhD Thesis) – Universidade Federal de Santa Catarina - UFSC, Florianópolis, Santa Catarina (1995).

DOI: 10.29183/2596-237x.ensus2023.v11.n1.p195-205

Google Scholar

[3] C.F. Tsai, W. Eberle and C.Y. Chu. Genetic algorithms in feature and instance selection. Knowledge-Based Systems, v. 39, n. 24, pp.240-247 (2013).

DOI: 10.1016/j.knosys.2012.11.005

Google Scholar

[4] B. Le Roux and H. Rouanet. Geometric Data Analysis: from correspondence analysis to structured data analysis, 1st Edition, Kluwer Academic Publishers, New York (2005).

DOI: 10.1007/1-4020-2236-0

Google Scholar

[5] M.T.A. Steiner, J. Nievola, N.Y. Soma, N. Y., T. Shimizu and P. J. Steiner Neto. Extração de regras de classificação a partir de redes neurais para auxílio à tomada de decisão na concessão de crédito bancário. Pesquisa Operacional, v. 27, n. 3, pp.407-426 (2007).

DOI: 10.1590/s0101-74382007000300002

Google Scholar

[6] A. Agresti. Categorical data analysis, 2nd ed. John Wiley & Sons, New York (2002).

Google Scholar

[7] B. Le Roux and H. Rouanet. Multiple Correspondence Analysis, SAGE, Thousand Oaks, CA, USA (2010).

Google Scholar

[8] J.P. Benzécri. L' analyse des données, Dunod, Paris, France (1973).

Google Scholar

[9] I.T. Jolliffe. Principal Component Analysis, 2nd edition, Springer, New York, NY, USA (2002).

Google Scholar

[10] J.P. Benzécri. Correspondence analysis handbook, Marcel Dekker, New York, NY, USA (1992).

Google Scholar

[11] H. Abdi and L.J. Wlliams. Correspondence analysis, Encyclopedia of Research Design, SAGE, Thousand Oaks, CA, USA (2010).

Google Scholar

[12] A. Frank and A. Asuncion. UCI machine learning repository Irvine. University of California Irvine, CA, USA (2010).

Google Scholar

[13] C.H. Wen and W.Y. Chen. Using multiple correspondence cluster analysis to map the competitive position of airlines. Journal of Air Transport Management, v. 17, n. 5, pp.302-304 (2011).

DOI: 10.1016/j.jairtraman.2011.03.006

Google Scholar