Paper Title:
A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets
  Abstract

The classifier, built from a highly-skewed class distribution data set, generally predicts an unknown sample as the majority class much more frequently than the minority class. This is due to the fact that the aim of classifier is designed to get the highest classification accuracy. We compare three classification methods dealing with the data sets in which class distribution is imbalanced and has non-uniform misclassification cost, namely cost-sensitive learning method whose misclassification cost is embedded in the algorithm, over-sampling method and under-sampling method. In this paper, we compare these three methods to determine which one will produce the best overall classification under any circumstance. We have the following conclusion: 1. Cost-sensitive learning is suitable for the classification of imbalanced dataset. It outperforms sampling methods overall, and is more stable than sampling methods except the condition that data set is quite small. 2. If the dataset is highly skewed or quite small, over-sampling methods may be better.

  Info
Periodical
Advanced Materials Research (Volumes 271-273)
Edited by
Junqiao Xiong
Pages
1291-1296
DOI
10.4028/www.scientific.net/AMR.271-273.1291
Citation
J. W. Zhang, H. J. Lu, W. T. Chen, Y. Lu, "A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets", Advanced Materials Research, Vols. 271-273, pp. 1291-1296, 2011
Online since
July 2011
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Kai Li, Hong Tao Gao
Abstract:To improve the generalization performance for ensemble learning, a subgraph based selective classifier ensemble algorithm is presented....
261
Authors: Fei Fei Xia, Qi Feng Zhou
Abstract:A new SVM (Support Vector Machine) classifier-combination model, based on Hierarchical Partition approach, for enterprise credit assessment...
57
Authors: Xiao Lin Chen, Yan Jiang, Min Jie Chen, Yong Yu, Hong Ping Nie, Min Li
Chapter 6: Engineering Material, Mechanical Engineering and Applied Mechanics
Abstract:A lot of cost-sensitive support machine vector methods are used to handle the imbalanced datasets, but the obtained results are not as...
1342
Authors: Wei Mei Zhi, Hua Ping Guo, Ming Fan
Chapter 4: Data, Image and Signal Processing
Abstract:Most classifiers lose efficiency with the problem of imbalanced class distribution, which, however, often shows statistical significant in...
622
Authors: Mu Hee Song
Chapter 12: Applications of Information Technology and Computer in Industry
Abstract:Due to the distribution of personal computers and the internet, E-mail has become one of the most widely used communicative means. However, a...
1844