Sample Size on the Impact of Imbalance Learning

Wei Mei Zhi; Hua Ping Guo; Ming Fan

doi:10.4028/www.scientific.net/AMR.756-759.2547

Paper Titles

The Projectively Flat Conditions of One Special Class (α, β)-Metrics
p.2528

Multi-Dimensional Comprehensive Link Stability Model in Ad Hoc Network
p.2533

Feasibility Analysis of the Confirmation of Self-Created Goodwill
p.2538

Enlightenment from Australian Network Security Plan to Chinese Information Security
p.2542

Sample Size on the Impact of Imbalance Learning
p.2547

Maximum Profit Model for Longitudinal Multi-Level Inventory Structure
p.2552

Fuzzy Evaluation Expert System for Undergraduate Comprehensive Quality Based on Agent
p.2557

The Construction for Generalized Mandelbrot Sets of the Frieze Group
p.2562

Mathematic Description of the Spiral Wooden Staircase Handrail Elbow
p.2567

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 756-759Sample Size on the Impact of Imbalance Learning

Sample Size on the Impact of Imbalance Learning

Abstract:

Classification of imbalanced data sets is widely used in many real life applications. Most state-of-the-art classification methods which assume the data sets are relatively balanced lose their efficiency. The paper discusses the factors which influence the modeling of a capable classifier in identifying rare events, especially for the factor of sample size. Carefully designed experiments using Rotation Forest as base classifier, carried on 3 datasets from UCI Machine Learning Repository based on weak show that, in particular imbalance ratio, increases the size of training set by unsupervised resample the large error rate caused by the imbalanced class distribution decreases. The common classification algorithm can reach good effect.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

2547-2551

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.756-759.2547

Citation:

Cite this paper

Online since:

September 2013

Authors:

Wei Mei Zhi, Hua Ping Guo, Ming Fan

Keywords:

Classification, Imbalanced Data Sets, Principal Component Analysis (PCA), Sampling Size

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] P-N. Tan and M. Steinbach, Introduction to Data Mining, p.127–187, ［M］ (2005).

Google Scholar

[2] Fawcett, T. and Provost, F., Combining Data Mining and Machine Learning for Effective User Profile, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Protland OR, AAAI Press(1996) 8-13.

Google Scholar

[3] Lewis, D. and Caltett, Heterogeneous, Uncertainty Sampling for Supervized Learning. Proceedings of the 11th International Conference on Machine Learning. ICML'94(1994)148-156.

Google Scholar

[4] Y. Sun, M. S. Kamel and A. K.C. Wong, Cost-sensitive boosting for classification of imbalanced data, Patter Recognition Society, pp.3358-3378 , (2007).

DOI: 10.1016/j.patcog.2007.04.009

Google Scholar

[5] S. Visa and A. Ralescu, Issues in Mining imbalanced Data Sets-A Review Paper, Proc. Of MidWest Artificial Intelligence and Cognitive Science Conference, pp.67-73, (2005).

Google Scholar

[6] G. Weiss and F. Provost, Learning when training data are costly: the effect of class distribution on tree induction, J. Aritif. Intell. Res. 19, pp.315-354 , (2003).

DOI: 10.1613/jair.1199

Google Scholar

[7] K. Ezawa, M. Singh and S. W. Norton, Learning goal oriented Bayesian networks for telecommunications risk management, in: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp.139-147, (1996).

Google Scholar

[8] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, SMOTE: Synthetic Minority Over-Sampling Technique, J. Artificial Intelligence Research, vol. 16, pp.321-357, (2002).

DOI: 10.1613/jair.953

Google Scholar

[9] R. Agarwal, and M. V. Joshi, PNrule: A new Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), Technical Report TR 00-01, Department of Computer Science University of Minnesota, USA, (2000).

DOI: 10.1137/1.9781611972719.29

Google Scholar

[10] Cieslak, D.A., CHawla, N.V., Learning decision trees for unbalanced data. ECML 99. 241-256(2008).

Google Scholar