An Empirical Study of Boosting Methods on Severely Imbalanced Data

Article Preview

Abstract:

Nowadays there are large volumes of data in real-world applications, which poses great challenge to class-imbalance learning: the large amount of the majority class examples and severe class-imbalance. Previous studies on class-imbalance learning mainly focused on relatively small or moderate class-imbalance. In this paper we conduct an empirical study to explore the difference between learning with small or moderate class-imbalance and learning with severe class-imbalance. The experimental results show that: (1) Traditional methods cannot handle severe class-imbalance effectively. (2) AUC, G-mean and F-measure can be very inconsistent for severe class-imbalance, which seldom appears when class-imbalance is moderate. And G-mean is not appropriate for severe class-imbalance learning because it is not sensitive to the change of imbalance ratio. (3) When AUC and G-mean are evaluation metrics, EasyEnsemble is the best method, followed by BalanceCascade and under-sampling. (4) A little under-full balance is better for under-sampling to handle severe class-imbalance. And it is important to handle false positives when design methods for severe class-imbalance.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2510-2513

Citation:

Online since:

February 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition [J], 30(6): 1145–1159, (1997).

DOI: 10.1016/s0031-3203(96)00142-2

Google Scholar

[2] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research [J], 16: 321–357, (2002).

DOI: 10.1613/jair.953

Google Scholar

[3] Z. -H. Zhou and X. -Y. Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering [J], 18(1): 63–77, (2006).

DOI: 10.1109/tkde.2006.17

Google Scholar

[4] P. Viola and M. Jones. Fast and robust classification using asymmetric AdaBoost and a detector cascade. In Advances in Neural Information Processing Systems 14 [C], 1311–1318, (2002).

Google Scholar

[5] X. -Y. Liu, J. Wu and Z. -H. Zhou. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics [J], 539–550, (2009).

DOI: 10.1109/tsmcb.2008.2007853

Google Scholar