Graph Based Semi-Supervised Learning Method for Imbalanced Dataset

Chen Guang Zhang; Yan Zhang; Xia Huan Zhang

doi:10.4028/www.scientific.net/AMM.556-562.4040

Paper Titles

A Nonmonotone Modified BFGS Method for Non-Convex Minimization
p.4023

A Calculation Study on the Instrument Air of Gas Transmission Station
p.4027

A Survey on Application of Data Mining on Transformer Condition Assessment
p.4031

Global Health Prediction Based on Vector Projection Algorithm
p.4035

Graph Based Semi-Supervised Learning Method for Imbalanced Dataset
p.4040

A Control Strategy for Slip Regulation Coordinated with Driver Intention
p.4045

Research on the Mapping Model between Users’ Cognitive Concept and Product Shape Information
p.4051

Dynamic Simulation Method for Hard-Rock Pillar Failure in Open-Stope Goaf
p.4055

Occupant Evacuation Based on Cellular Automata Involving Repulsion
p.4061

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 556-562Graph Based Semi-Supervised Learning Method for...

Graph Based Semi-Supervised Learning Method for Imbalanced Dataset

Abstract:

In real application areas, the dataset used may be highly imbalanced and the number of instances for some classes are much higher than that of the other classes. When learning from highly imbalanced dataset, the classifier tends to be adapted to suit the majority class, which might make classifier to obtain a high predictive accuracy over the majority class, but poor accuracy over the minority class. To solve this problem, we put forward a novel graph based semi-supervised learning method for imbalanced dataset, called GSMID. GSMID characterize the class equilibrium constraint as the smoothness of class labels. It’s expected to derive the optimal assignment of class membership to unlabeled samples by maximizing the correlations of classes and simultaneously as smooth as possible on instance graph. The experiments comparing GSMID to SVM and other graph based semi-supervised learning methods on several real-world datasets show GSMM can effectively improve the classification accuracy on imbalanced dataset, especially when data is highly skewed.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 556-562)

Pages:

4040-4044

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.556-562.4040

Citation:

Cite this paper

Online since:

May 2014

Authors:

Chen Guang Zhang, Yan Zhang*, Xia Huan Zhang

Keywords:

Class Correlation, Graph Semi-Supervised Learning, Imbalanced Dataset

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] V. López, etc.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences. Vol. 250(20) (2013), p.113–141.

DOI: 10.1016/j.ins.2013.07.007

Google Scholar

[2] A. Fernández, etc.: Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets, Inter. J. of App. Reasoning. Vol. 50(3)(2009), pp.561-577.

DOI: 10.1016/j.ijar.2008.11.004

Google Scholar

[3] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support vector machines for class imbalance learning, IEEE Transactions on Fuzzy Systems. Vol. 18 (3) (2010), pp.558-571.

DOI: 10.1109/tfuzz.2010.2042721

Google Scholar

[4] F. Herrera: Genetic fuzzy systems: taxonomy, current research trends and prospects, Evolutionary Intelligence, Vol. 1(1)(2008), pp.27-46.

DOI: 10.1007/s12065-007-0001-5

Google Scholar

[5] R. Batuwita, V. Palade: Proc. of the 2010 International Joint Conference on Neural Networks (Barcelona, Spain, July 18-23, 2010). pp.1-8.

Google Scholar

[6] M. Galar and etc.: A review on ensembles for class imbalance problem: bagging, boosting and hybrid based approaches, IEEE Trans. on Sys., Man, and Cyber. Vol. 42 (4) (2012), pp.463-484.

DOI: 10.1109/tsmcc.2011.2161285

Google Scholar

[7] Y. LIN, Y. LEE,G. WAHBA: Support Vector Machines for Classification in Nonstandard Situations, March. Learn. Vol. 46(2002), pp.191-202.

Google Scholar

[8] M. Alberto, B. Matteo, R.G. Valentini: A neural network algorithm for semi-supervised node label learning from unbalanced data, Neural Networks. Vol. 43(2013), p.84–98.

DOI: 10.1016/j.neunet.2013.01.021

Google Scholar

[9] J. Wang, T. Jebara, S.F. Chang: Semi-supervised learning using greedy max-cut, The Journal of Machine Learning Research. Vol. 14(1)(2013), pp.771-800.

Google Scholar

[10] T.M. Huang, V. Kecman: Knowledge Based and Emergent Technologies Relied Intelligent Information and Engineering Systems(Springer Verlag, Heidelberg 2004).

Google Scholar

[11] C.G. Zhang, Y.J. Li: Hash graph based semi-supervised learning method and its application in image segmentation, Acta Automatica Sinica. Vol. 36 (11)(2010), pp.1527-1533.

DOI: 10.3724/sp.j.1004.2010.01527

Google Scholar

[12] C.C. Chang, C.J. Lin: LIBSVM: a library for support vector machines. http: / www. csie. ntu. edu. tw/~cjlin/libsvm.

Google Scholar

[13] Asuncion A, Newman D. UCI machine learning repository. http: /archive. ics. uci. edu/ml.

Google Scholar