Classification with Local Clustering in Imbalanced Data Sets
In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.
Helen Zhang, Gang Shen and David Jin
H. Ji and H. X. Zhang, "Classification with Local Clustering in Imbalanced Data Sets", Advanced Materials Research, Vols. 219-220, pp. 151-155, 2011