Discussion of Classification for Imbalanced Data Sets

Article Preview

Abstract:

Most classifiers lose efficiency with the problem of imbalanced class distribution, which, however, often shows statistical significant in practice. Therefore, the problem of learning from imbalanced datasets has attracted growing attention in recent years. The paper provide a comprehensive review of the classification of imbalanced datasets, the nature of the problem, the factor which affected the problem, the current assessment metrics used to evaluate learning performance, as well as the opportunities and challenges in the learning from imbalanced data.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 546-547)

Pages:

622-627

Citation:

Online since:

July 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] P-N. Tan and M. Steinbach, Introduction to Data Mining, p.127–187, [M] (2005).

Google Scholar

[2] Y. Sun, M. S. Kamel and A. K.C. Wong, Cost-sensitive boosting for classification of imbalanced data, Patter Recognition Society, pp.3358-3378 , (2007).

DOI: 10.1016/j.patcog.2007.04.009

Google Scholar

[3] H. He and E. A. Garcia, Learning from imbalanced Data, IEEE Transactions on Knowledge and Data Engineering, VOL 21, No. 9, pp.1263-1284, (2009).

DOI: 10.1109/tkde.2008.239

Google Scholar

[4] S. Visa and A. Ralescu, Issues in Mining imbalanced Data Sets-A Review Paper, Proc. Of MidWest Artificial Intelligence and Cognitive Science Conference, pp.67-73, (2005).

Google Scholar

[5] G.E.A.P.A. Batista, R. C. Prati and M. C. Monard, A study of the Behavior of several methods for balancing machine learning training data, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, vol. 6(1), pp.20-29, (2004).

DOI: 10.1145/1007730.1007735

Google Scholar

[6] N. Japkowicz and S. Stepen, The class imbalance problem: a systematic study, Intell. Data Anal. J. 6(5), pp.429-450, (2002).

Google Scholar

[7] G. Weiss and F. Provost, Learning when training data are costly: the effect of class distribution on tree induction, J. Aritif. Intell. Res. 19, pp.315-354 , (2003).

DOI: 10.1613/jair.1199

Google Scholar

[8] M. V. Joshi, Learning classifier models for predicting rare phenomena, Ph.D. Thesis, University of Minnesota, Twin Cites, MN, USA, (2002).

Google Scholar

[9] N. Japkowicz and S. Stephen, The class imbalance problem: a systematic study, Intell. Data Anal. J. Vol 6(5), pp.429-450, (2002).

Google Scholar

[10] N. Japkowicz, Concept-learning in the presence of between-class and within-class imbalance, Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, Ottawa, Canada, pp.67-77, June (2001).

DOI: 10.1007/3-540-45153-6_7

Google Scholar

[11] R. Akbani, S. Kwek and N. Jakowicz, Applying support vector machines to imbalanced datasets, Proceedings of European Conference on Machine Learning, Pisa, Italy, pp.39-50, September (2004).

DOI: 10.1007/978-3-540-30115-8_7

Google Scholar

[12] B. Raskutti and A. Kowalczyk, Extreme rebalancing for SVMs: a case study, Proceedings of European Conference on Machine Learning, Pisa, Italy, pp.60-69, September (2004).

Google Scholar

[13] G. Wu and E. Y. Chang, Class-boundary alignment for imbalanced dataset learning, " Proceedings of the ICML, 03 Workshop on Learning from Imbalanced Data Sets, Washington, DC, August (2003).

Google Scholar

[14] K. Ezawa, M. Singh and S. W. Norton, Learning goal oriented Bayesian networks for telecommunications risk management, in: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp.139-147, (1996).

Google Scholar

[15] J. Zhang and I. Mani, KNN approach to unbalanced data distributions: a case study involving information extraction, " Proceedings of the ICML, 03 Workshop on learning from Imbalanced Data Sets, Washing, DC,August (2003).

Google Scholar

[16] X. Liu, J. Wu and Z. Zhou, Exploratory Under Sampling for Class Imbalance Learning, " Proc. Int, L Conf. Data Mining pp.965-969, (2006).

Google Scholar

[17] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic Minority Over-Sampling Technique, J. Artificial Intelligence Research, vol. 16, pp.321-357, (2002).

DOI: 10.1613/jair.953

Google Scholar

[18] N. V. Chawla, A. Lazarevic, L. o. Hall, and K. W. Bowyer, SMOTEBoost: Improving Prediction of the Minority Class in Boosting, Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases, pp.107-119, (2003).

DOI: 10.1007/978-3-540-39804-2_12

Google Scholar

[19] L. Breiman, Bagging Predictors, Machine Learning, Vol 24 (2), pp.123-140, (1996).

Google Scholar

[20] Breiman, Random Forests, Machine Learning, Vol 45(1), pp.5-32, (2001).

Google Scholar

[21] Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the Thirteenth International Conference on Machine Learning, The Mit Press, Cambridge, MA, Morgan Kaufmann, Los Altos, CA, pp.148-156, (1996).

Google Scholar

[22] R. Agarwal, and M. V. Joshi, PNrule: A new Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), Technical Report TR 00-01, Department of Computer Science University of Minnesota, USA, (2000).

DOI: 10.1137/1.9781611972719.29

Google Scholar

[23] C. Elkan, The Foundations of Cost-Sensitive Learning, " Proc. Int, l Joint Conf. Artificial Intelligence, pp.973-978, (2001).

Google Scholar

[24] K. M. Ting, An Instance-Weighting Method to Induce Cost-Sensitive Trees, IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp.659-665, (2002).

DOI: 10.1109/tkde.2002.1000348

Google Scholar

[25] W. Fan, S. Stolfo and J. Zhang, AdaCost: Misclassification Cost-sentitive Boosting, Proceedings of the 16th International Conference on Machine Learning, pp.97-105, (1999).

Google Scholar

[26] J. Wu, H. Xiong and J. Chen, COG: local decomposition for rare class analysis, DMKD, Vol. 20(2), pp.1384-5810, (2010).

DOI: 10.1007/s10618-009-0146-1

Google Scholar