Paper Titles

A Method of Recommendation the Most Used XML Tags
p.1353

Classification of Tourism Web with Modified Naïve Bayes Algorithm
p.1360

Personal Health Assistant on Android Mobile Device: Sleeping, Nutrition and Exercise
p.1365

Departure Prediction of Online Game Players
p.1370

Hybrid Balancing Technique Using GRSOM and Bootstrap Algorithms for Classifiers with Imbalanced Data
p.1375

A Framework of Personalized Travelling Information Services for Thailand
p.1382

Multi-Platform Institutional Repository
p.1387

Classifiers for Ground-Based Cloud Images Using Texture Features
p.1392

Automatic Genre Classification of TV Programs Using Audio and Face Processing
p.1397

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 931-932Hybrid Balancing Technique Using GRSOM and...

Hybrid Balancing Technique Using GRSOM and Bootstrap Algorithms for Classifiers with Imbalanced Data

Article Preview

Abstract:

To deal with imbalanced data, this paper proposes a hybrid data balancing technique which incorporates both over and under-sampling approaches. This technique determines how much minority data should be grown as well as how much majority data should be reduced. In this manner, noise introduced to the data due to excessive over-sampling could be avoided. On top of that, the proposed data balancing technique helps to determine the appropriate size of the balanced data and thus computation time required for construction of classifiers would be more efficient. The data balancing technique over samples the minority data through GRSOM method and then under samples the majority data using the bootstrap sampling approach. GRSOM is used in this study because it grows new samples in a non-linear fashion and preserves the original data structure. Performance of the proposed method is tested using four data sets from UCI Machine Learning Repository. Once the data sets are balanced, the committee of classifiers is constructed using these balanced data. The experimental results reveal that our proposed data balancing method provides the best performance.

You might also be interested in these eBooks

KKU International Engineering

Info:

Periodical:

Advanced Materials Research (Volumes 931-932)

Pages:

1375-1381

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.931-932.1375

Citation:

Cite this paper

Online since:

May 2014

Authors:

Sirorat Pattanapairoj, Danaipong Chetchotsak*, Banchar Arnonkijpanich

Keywords:

Committee Networks, Data Balancing Technique, Imbalanced Data

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] M.R. Kubat, C. Holte and S. Matwin, Machine learning for the detection of oil spills in satellite radar images, Machine Learning, 30 (2-3) (1998), 195–215.

DOI: 10.1023/a:1007452223027

[2] C.S. Hilas and P.A. Mastorocostas, An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowledge-Based Systems, 21(7) (2008), 721-726.

DOI: 10.1016/j.knosys.2008.03.026

[3] P.K. Chan, F. Wei, A. Prodromidis and S.J. Stolfo, Distributed data mining in credit card fraud detection. IEEE Intelligent Systems, 14 (6) (1999), 67-74.

DOI: 10.1109/5254.809570

[4] S. Daskalaki, I. Kopanas and N. Avouris, Evaluation of classifiers for an uneven class distribution problem, Applied Artificial Intelligence, 20 (5) (2006), 381-417.

DOI: 10.1080/08839510500313653

[5] Y.M. Huang, C.M. Hung and H.C. Jiau, Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem, Nonlinear Analysis: Real World Applications, 7 (4) (2006), 720-757.

DOI: 10.1016/j.nonrwa.2005.04.006

[6] I. Adrianto, M.B. Richman and T.B. Trafalis, Intelligent Engineering Systems Through Artificial Neural Networks, Machine Learning Techniques for Imbalanced Data: An Application for Tornado Detection. Proceedings of Artificial Neural Networks in Engineering Conference ANNIE 2010, pp.509-516.

DOI: 10.1115/1.859599.paper63

[7] D. Chetchotsak, S. Pattanapairoj, Intelligent Engineering Systems Through Artificial Neural Networks. Committee Network Model for HDD Functional Tests, Proceedings of Artificial Neural Networks in Engineering Conference (ANNIE) 2010, pp.629-636.

DOI: 10.1115/1.859599.paper78

[8] D.C. Li, C.W. Liu and S.C. Hu, A learning method for the class imbalance problem with medical data sets, Computers in Biology and Medicine, 40 (2010), 509-518.

DOI: 10.1016/j.compbiomed.2010.03.005

[9] S.J. Yen and Y.S. Lee, Cluster-based Under-sampling Approaches for Imbalanced Data Distributions, Expert System with Applications, 36 (2009), 5718-5727.

DOI: 10.1016/j.eswa.2008.06.108

[10] S. Pattanapairoj, D. Chetchotsak and B. Arnonkijpanich, Integrating New Data Balancing Technique with Committee Networks for Imbalanced Data: GRSOM Approach, submited to Neural Computing and Applications.

DOI: 10.1007/s11571-015-9350-4

[11] Y. Bai, W. Zhang and H. Hu, An Efficient Growing Ring SOM and Its Application to TSP, Proceedings of the 9th WSEAS International Conference on Applied Mathematics. Istanbul, Turkey 2006a, pp.351-355.

[12] Y. Bai, W. Zhang and Z. Jin, An New Self-Organizing Maps Strategy for Solving the Traveling Salesman Problem, Chaos Solitons and Fractals, 28 (2006b), 1082-1089.

DOI: 10.1016/j.chaos.2005.08.114

[13] N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 16 (2002), 321-357.

DOI: 10.1613/jair.953

[14] N.V. Chawla, A. Lazarevic, L.O. Hall and K.W. Bowyer, Smoteboost: Improving Prediction of the Minority Class in Boosting, Proceedings of The 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Dubrovnik, Croatia, 2003, pp.107-119.

DOI: 10.1007/978-3-540-39804-2_12

[15] Y. Liu, X. Yu, J.X. Huang and A. An, Combining Integrated Sampling with SVM Ensembles for Learning from Imbalanced dataset, Information Processing and Management, 47 (2011), 617-631.

DOI: 10.1016/j.ipm.2010.11.007

[16] A. Fernandez, S. Garcia, M.J. Jesus and F. Herrera, A Study of The Behaviour of Linguistic Fuzzy Rule based Classification Systems in the Framework of Imbalanced Data-set, Fuzzy Sets and Systems, 159 (2008), 2378-2398.

DOI: 10.1016/j.fss.2007.12.023

[17] Y.M. Chyi, Classification Analysis Techniques for skewed class distribution problems, Master thesis, Department of Information Management, National Sun Yat-Sen University, (2003).

[18] J. Zhang and I. Mani, kNN approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction, Proceeding of the ICML workshop on learning from imbalanced dataset, (2003).

[19] R. Akbani, S. Kwek and N. Japkowic, Appling Support Vector Machines to imbalanced datasets, ECML 2004, pp.39-50.

DOI: 10.1007/978-3-540-30115-8_7

[20] Y. Sun, M.S. Kamel, A.K.C. Wong and Y. Wang, Cost-sensitive Boosting for Classification of Imbalanced data, The Journal of The Pattern Recognition Society, (40) (2007), 3358-3378.

DOI: 10.1016/j.patcog.2007.04.009

[21] J.P. Hwang, S. Park and E. Kim, A New Weighted Approach to Imbalanced data Classification Problem via Support Vector Machine with Quadratic Cost Function, Expert Systems with Applications, 38 (2011) 8580-8585.

DOI: 10.1016/j.eswa.2011.01.061

[22] Y. Tang, Y.Q. Zhang, N.V. Chawla and S. Krasser, SVMs Modeling for Highly Imbalanced Classification, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 39(1) (2002), 281-288.

DOI: 10.1109/tsmcb.2008.2002909

[23] J. Ren, ANN vs. SVM: Which one performs better in classification of MCCs in mammogram imaging, Knowledge-Based Systems, 26 (2012), 144-153.

DOI: 10.1016/j.knosys.2011.07.016

[24] Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. [http: /archive. ics. uci. edu/ml].