Performance Improvement in the Pattern Classification of Nominal Data Sets Applying Multiple Correspondence Analysis

Rodrigo Clemente Thom de Souza; Maria Teresinha Arns Steiner; Leandro dos Santos Coelho

doi:10.4028/www.scientific.net/AMM.670-671.1482

Paper Titles

The Impact of NiCuZn Ferrite Material on the Inductive Wireless Charging
p.1462

Effect of Annealing and Gate Insulator Material Changing on the Performances of IGZO-TFT
p.1467

A Unified Framework for the Evaluation of Complex Networks
p.1473

Highly Reliable Software Reliability Assessment Based on Statistics of Extremes and Bootstrapping Method
p.1477

Performance Improvement in the Pattern Classification of Nominal Data Sets Applying Multiple Correspondence Analysis
p.1482

Phase Image Segmentation and Filtering Algorithm Based on Direction of the Gradient Factor
p.1488

Research on Automatic Recognition of Separable Words in Modern Chinese
p.1493

The Attacking Dispersion Optimization in Multi Launch Rocket System Base-On Improved Genetic Algorithm
p.1499

Theory and Technology Research on the Software Health Management
p.1503

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 670-671Performance Improvement in the Pattern...

Performance Improvement in the Pattern Classification of Nominal Data Sets Applying Multiple Correspondence Analysis

Abstract:

Classification is a supervised learning problem used to discriminate data instances in different classes. The solution to this problem is obtained through algorithms (classifiers) that look for patterns of relationships between classes in known cases, using these relationships to classify unknown cases. The performance of the classifiers depends substantially of the data types. In order to give proper treatment to nominal data, this paper shows that the application of previous transformations can substantially improve the performance of classifiers, bringing significant benefits to the result of the whole process of Knowledge Discovery in Databases (KDD). This paper uses three different data sets with nominal data and two well-known classifiers: the Linear Discriminant Analysis (LDA), and the Naïve-Bayes (NB). For data transformation, the paper applies an approach called Geometric Data Analysis (GDA). The GDA techniques compared in this paper are the traditional Principal Component Analysis (PCA) and the underexplored Multiple Correspondence Analysis (MCA). The results confirm the capability of the GDA transformation to improve the classification accuracy and attest the superiority of the MCA in comparison with its precursor, the PCA, when applied to nominal data.

You might also be interested in these eBooks

Applied Mechanics, Materials and Manufacturing IV

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 670-671)

Pages:

1482-1487

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.670-671.1482

Citation:

Cite this paper

Online since:

October 2014

Authors:

Rodrigo Clemente Thom de Souza*, Maria Teresinha Arns Steiner, Leandro dos Santos Coelho

Keywords:

Data Transformation, Geometric Data Analysis, Knowledge Discovery in Databases, Multiple Correspondence Analysis, Nominal Data, Pattern Classification

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. Advances in knowledge discovery e data mining. Association for the Advancement of Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA (1996).

Google Scholar

[2] M.T. A Steiner. Uma metodologia para o reconhecimento de padrões multivariados com resposta dicotômica. (PhD Thesis) – Universidade Federal de Santa Catarina - UFSC, Florianópolis, Santa Catarina (1995).

DOI: 10.29183/2596-237x.ensus2023.v11.n1.p195-205

Google Scholar

[3] C.F. Tsai, W. Eberle and C.Y. Chu. Genetic algorithms in feature and instance selection. Knowledge-Based Systems, v. 39, n. 24, pp.240-247 (2013).

DOI: 10.1016/j.knosys.2012.11.005

Google Scholar

[4] B. Le Roux and H. Rouanet. Geometric Data Analysis: from correspondence analysis to structured data analysis, 1st Edition, Kluwer Academic Publishers, New York (2005).

DOI: 10.1007/1-4020-2236-0

Google Scholar

[5] M.T.A. Steiner, J. Nievola, N.Y. Soma, N. Y., T. Shimizu and P. J. Steiner Neto. Extração de regras de classificação a partir de redes neurais para auxílio à tomada de decisão na concessão de crédito bancário. Pesquisa Operacional, v. 27, n. 3, pp.407-426 (2007).

DOI: 10.1590/s0101-74382007000300002

Google Scholar

[6] A. Agresti. Categorical data analysis, 2nd ed. John Wiley & Sons, New York (2002).

Google Scholar

[7] B. Le Roux and H. Rouanet. Multiple Correspondence Analysis, SAGE, Thousand Oaks, CA, USA (2010).

Google Scholar

[8] J.P. Benzécri. L' analyse des données, Dunod, Paris, France (1973).

Google Scholar

[9] I.T. Jolliffe. Principal Component Analysis, 2nd edition, Springer, New York, NY, USA (2002).

Google Scholar

[10] J.P. Benzécri. Correspondence analysis handbook, Marcel Dekker, New York, NY, USA (1992).

Google Scholar

[11] H. Abdi and L.J. Wlliams. Correspondence analysis, Encyclopedia of Research Design, SAGE, Thousand Oaks, CA, USA (2010).

Google Scholar

[12] A. Frank and A. Asuncion. UCI machine learning repository Irvine. University of California Irvine, CA, USA (2010).

Google Scholar

[13] C.H. Wen and W.Y. Chen. Using multiple correspondence cluster analysis to map the competitive position of airlines. Journal of Air Transport Management, v. 17, n. 5, pp.302-304 (2011).

DOI: 10.1016/j.jairtraman.2011.03.006

Google Scholar