Paper Title:
Classification with Local Clustering in Imbalanced Data Sets
  Abstract

In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.

  Info
Periodical
Advanced Materials Research (Volumes 219-220)
Edited by
Helen Zhang, Gang Shen and David Jin
Pages
151-155
DOI
10.4028/www.scientific.net/AMR.219-220.151
Citation
H. Ji, H. X. Zhang, "Classification with Local Clustering in Imbalanced Data Sets", Advanced Materials Research, Vols. 219-220, pp. 151-155, 2011
Online since
March 2011
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Zhou Suo Zhang, Minghui Shen, Wenzhi Lv, Zheng Jia He
Abstract:Aiming at problem on limiting development of machinery fault intelligent diagnosis due to needing many fault data samples, this paper...
483
Authors: Liang Jun Li, Bin Zhang, Yuan Yuan Che, Ming Yang, Tie Nan Li
Abstract:In text association classification research, feature distribution of the training sample collection impacts greatly on the classification...
246
Authors: Su Qun Cao, Yun Feng Bu
Abstract:Scatter matrix based class separability criterion is commonly used in supervised feature extraction. But calculations of scatter matrixes...
409
Authors: Xiao Yun Chen, Jin Hua Chen
Abstract:There is a problem that the difficulty in text classification will increase when the number of classes increases, to which hierarchical...
2233
Authors: Xiao Lin Chen, Yan Jiang, Min Jie Chen, Yong Yu, Hong Ping Nie, Min Li
Chapter 6: Engineering Material, Mechanical Engineering and Applied Mechanics
Abstract:A lot of cost-sensitive support machine vector methods are used to handle the imbalanced datasets, but the obtained results are not as...
1342