Graph Based Semi-Supervised Learning Method for Imbalanced Dataset

Article Preview

Abstract:

In real application areas, the dataset used may be highly imbalanced and the number of instances for some classes are much higher than that of the other classes. When learning from highly imbalanced dataset, the classifier tends to be adapted to suit the majority class, which might make classifier to obtain a high predictive accuracy over the majority class, but poor accuracy over the minority class. To solve this problem, we put forward a novel graph based semi-supervised learning method for imbalanced dataset, called GSMID. GSMID characterize the class equilibrium constraint as the smoothness of class labels. It’s expected to derive the optimal assignment of class membership to unlabeled samples by maximizing the correlations of classes and simultaneously as smooth as possible on instance graph. The experiments comparing GSMID to SVM and other graph based semi-supervised learning methods on several real-world datasets show GSMM can effectively improve the classification accuracy on imbalanced dataset, especially when data is highly skewed.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

4040-4044

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] V. López, etc.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences. Vol. 250(20) (2013), p.113–141.

DOI: 10.1016/j.ins.2013.07.007

Google Scholar

[2] A. Fernández, etc.: Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets, Inter. J. of App. Reasoning. Vol. 50(3)(2009), pp.561-577.

DOI: 10.1016/j.ijar.2008.11.004

Google Scholar

[3] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support vector machines for class imbalance learning, IEEE Transactions on Fuzzy Systems. Vol. 18 (3) (2010), pp.558-571.

DOI: 10.1109/tfuzz.2010.2042721

Google Scholar

[4] F. Herrera: Genetic fuzzy systems: taxonomy, current research trends and prospects, Evolutionary Intelligence, Vol. 1(1)(2008), pp.27-46.

DOI: 10.1007/s12065-007-0001-5

Google Scholar

[5] R. Batuwita, V. Palade: Proc. of the 2010 International Joint Conference on Neural Networks (Barcelona, Spain, July 18-23, 2010). pp.1-8.

Google Scholar

[6] M. Galar and etc.: A review on ensembles for class imbalance problem: bagging, boosting and hybrid based approaches, IEEE Trans. on Sys., Man, and Cyber. Vol. 42 (4) (2012), pp.463-484.

DOI: 10.1109/tsmcc.2011.2161285

Google Scholar

[7] Y. LIN, Y. LEE,G. WAHBA: Support Vector Machines for Classification in Nonstandard Situations, March. Learn. Vol. 46(2002), pp.191-202.

Google Scholar

[8] M. Alberto, B. Matteo, R.G. Valentini: A neural network algorithm for semi-supervised node label learning from unbalanced data, Neural Networks. Vol. 43(2013), p.84–98.

DOI: 10.1016/j.neunet.2013.01.021

Google Scholar

[9] J. Wang, T. Jebara, S.F. Chang: Semi-supervised learning using greedy max-cut, The Journal of Machine Learning Research. Vol. 14(1)(2013), pp.771-800.

Google Scholar

[10] T.M. Huang, V. Kecman: Knowledge Based and Emergent Technologies Relied Intelligent Information and Engineering Systems(Springer Verlag, Heidelberg 2004).

Google Scholar

[11] C.G. Zhang, Y.J. Li: Hash graph based semi-supervised learning method and its application in image segmentation, Acta Automatica Sinica. Vol. 36 (11)(2010), pp.1527-1533.

DOI: 10.3724/sp.j.1004.2010.01527

Google Scholar

[12] C.C. Chang, C.J. Lin: LIBSVM: a library for support vector machines. http: / www. csie. ntu. edu. tw/~cjlin/libsvm.

Google Scholar

[13] Asuncion A, Newman D. UCI machine learning repository. http: /archive. ics. uci. edu/ml.

Google Scholar