Paper Titles

Printing Digital Recognition Method Based on the Cascade Classification
p.599

Attribute-Tree Based Hierarchical Hidden Credential Model
p.604

Flame Image Processing-Based Intelligent Networked Control System of Roller Kiln
p.612

The Algebra Properties of the S-Boxes of Several Block Ciphers
p.617

Discussion of Classification for Imbalanced Data Sets
p.622

Correlation Analysis Method and its Application to Interpretation of Regional Gravity and Magnetic Anomalies in Eastern Xinjiang, China
p.628

The Analysis and Review of Mobile Surveillance Video Based on AVS-S
p.634

Elementary Discussion on Data Management of the Internet of Things
p.640

The Constructing of Test Set on Chinese Information Retrieval
p.645

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 546-547Discussion of Classification for Imbalanced Data...

Discussion of Classification for Imbalanced Data Sets

Article Preview

Abstract:

Most classifiers lose efficiency with the problem of imbalanced class distribution, which, however, often shows statistical significant in practice. Therefore, the problem of learning from imbalanced datasets has attracted growing attention in recent years. The paper provide a comprehensive review of the classification of imbalanced datasets, the nature of the problem, the factor which affected the problem, the current assessment metrics used to evaluate learning performance, as well as the opportunities and challenges in the learning from imbalanced data.

You might also be interested in these eBooks

Electrical Insulating Materials and Electrical Engineering

Info:

Periodical:

Advanced Materials Research (Volumes 546-547)

Pages:

622-627

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.546-547.622

Citation:

Cite this paper

Online since:

July 2012

Authors:

Wei Mei Zhi, Hua Ping Guo, Ming Fan

Keywords:

Classification, Cost-Sensitive Learning, Imbalanced Data Sets, Sampling Methods

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] P-N. Tan and M. Steinbach, Introduction to Data Mining, p.127–187, ［M］ (2005).

[2] Y. Sun, M. S. Kamel and A. K.C. Wong, Cost-sensitive boosting for classification of imbalanced data, Patter Recognition Society, pp.3358-3378 , (2007).

DOI: 10.1016/j.patcog.2007.04.009

[3] H. He and E. A. Garcia, Learning from imbalanced Data, IEEE Transactions on Knowledge and Data Engineering, VOL 21, No. 9, pp.1263-1284, (2009).

DOI: 10.1109/tkde.2008.239

[4] S. Visa and A. Ralescu, Issues in Mining imbalanced Data Sets-A Review Paper, Proc. Of MidWest Artificial Intelligence and Cognitive Science Conference, pp.67-73, (2005).

[5] G.E.A.P.A. Batista, R. C. Prati and M. C. Monard, A study of the Behavior of several methods for balancing machine learning training data, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, vol. 6(1), pp.20-29, (2004).

DOI: 10.1145/1007730.1007735

[6] N. Japkowicz and S. Stepen, The class imbalance problem: a systematic study, Intell. Data Anal. J. 6(5), pp.429-450, (2002).

[7] G. Weiss and F. Provost, Learning when training data are costly: the effect of class distribution on tree induction, J. Aritif. Intell. Res. 19, pp.315-354 , (2003).

DOI: 10.1613/jair.1199

[8] M. V. Joshi, Learning classifier models for predicting rare phenomena, Ph.D. Thesis, University of Minnesota, Twin Cites, MN, USA, (2002).

[9] N. Japkowicz and S. Stephen, The class imbalance problem: a systematic study, Intell. Data Anal. J. Vol 6(5), pp.429-450, (2002).

[10] N. Japkowicz, Concept-learning in the presence of between-class and within-class imbalance, Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, Ottawa, Canada, pp.67-77, June (2001).

DOI: 10.1007/3-540-45153-6_7

[11] R. Akbani, S. Kwek and N. Jakowicz, Applying support vector machines to imbalanced datasets, Proceedings of European Conference on Machine Learning, Pisa, Italy, pp.39-50, September (2004).

DOI: 10.1007/978-3-540-30115-8_7

[12] B. Raskutti and A. Kowalczyk, Extreme rebalancing for SVMs: a case study, Proceedings of European Conference on Machine Learning, Pisa, Italy, pp.60-69, September (2004).

[13] G. Wu and E. Y. Chang, Class-boundary alignment for imbalanced dataset learning, " Proceedings of the ICML, 03 Workshop on Learning from Imbalanced Data Sets, Washington, DC, August (2003).

[14] K. Ezawa, M. Singh and S. W. Norton, Learning goal oriented Bayesian networks for telecommunications risk management, in: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp.139-147, (1996).

[15] J. Zhang and I. Mani, KNN approach to unbalanced data distributions: a case study involving information extraction, " Proceedings of the ICML, 03 Workshop on learning from Imbalanced Data Sets, Washing， DC，August (2003).

[16] X. Liu, J. Wu and Z. Zhou, Exploratory Under Sampling for Class Imbalance Learning, " Proc. Int, L Conf. Data Mining pp.965-969, (2006).

[17] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic Minority Over-Sampling Technique, J. Artificial Intelligence Research, vol. 16, pp.321-357, (2002).

DOI: 10.1613/jair.953

[18] N. V. Chawla, A. Lazarevic, L. o. Hall, and K. W. Bowyer, SMOTEBoost: Improving Prediction of the Minority Class in Boosting, Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases, pp.107-119, (2003).

DOI: 10.1007/978-3-540-39804-2_12

[19] L. Breiman, Bagging Predictors, Machine Learning, Vol 24 (2), pp.123-140, (1996).

[20] Breiman, Random Forests, Machine Learning, Vol 45(1), pp.5-32, (2001).

[21] Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the Thirteenth International Conference on Machine Learning, The Mit Press, Cambridge, MA, Morgan Kaufmann, Los Altos, CA, pp.148-156, (1996).

[22] R. Agarwal, and M. V. Joshi, PNrule: A new Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), Technical Report TR 00-01, Department of Computer Science University of Minnesota, USA, (2000).

DOI: 10.1137/1.9781611972719.29

[23] C. Elkan, The Foundations of Cost-Sensitive Learning, " Proc. Int, l Joint Conf. Artificial Intelligence, pp.973-978, (2001).

[24] K. M. Ting, An Instance-Weighting Method to Induce Cost-Sensitive Trees, IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp.659-665, (2002).

DOI: 10.1109/tkde.2002.1000348

[25] W. Fan, S. Stolfo and J. Zhang, AdaCost: Misclassification Cost-sentitive Boosting, Proceedings of the 16th International Conference on Machine Learning, pp.97-105, (1999).

[26] J. Wu, H. Xiong and J. Chen, COG: local decomposition for rare class analysis, DMKD, Vol. 20(2), pp.1384-5810, (2010).

DOI: 10.1007/s10618-009-0146-1