p.1462
p.1467
p.1473
p.1477
p.1482
p.1488
p.1493
p.1499
p.1503
Performance Improvement in the Pattern Classification of Nominal Data Sets Applying Multiple Correspondence Analysis
Abstract:
Classification is a supervised learning problem used to discriminate data instances in different classes. The solution to this problem is obtained through algorithms (classifiers) that look for patterns of relationships between classes in known cases, using these relationships to classify unknown cases. The performance of the classifiers depends substantially of the data types. In order to give proper treatment to nominal data, this paper shows that the application of previous transformations can substantially improve the performance of classifiers, bringing significant benefits to the result of the whole process of Knowledge Discovery in Databases (KDD). This paper uses three different data sets with nominal data and two well-known classifiers: the Linear Discriminant Analysis (LDA), and the Naïve-Bayes (NB). For data transformation, the paper applies an approach called Geometric Data Analysis (GDA). The GDA techniques compared in this paper are the traditional Principal Component Analysis (PCA) and the underexplored Multiple Correspondence Analysis (MCA). The results confirm the capability of the GDA transformation to improve the classification accuracy and attest the superiority of the MCA in comparison with its precursor, the PCA, when applied to nominal data.
Info:
Periodical:
Pages:
1482-1487
Citation:
Online since:
October 2014
Price:
Сopyright:
© 2014 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: