Combining integrated sampling with SVM ensembles for learning from imbalanced datasets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Combining integrated sampling with SVM ensembles for learning from imbalanced datasets

Authors:	Yang Liu Xiaohui Yu Jimmy Xiangji Huang Aijun An

Institution:	1. School of Computer Science and Technology, Shandong University, Jinan, China;2. School of Information Technology, York University, Toronto, Canada;3. Department of Computer Science and Engineering, York University, Toronto, Canada

Abstract:	Learning from imbalanced datasets is difficult. The insufficient information that is associated with the minority class impedes making a clear understanding of the inherent structure of the dataset. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced, because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper, we study the performance of SVMs, which have gained great success in many real applications, in the imbalanced data context. Through empirical analysis, we show that SVMs may suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed. We propose to combine an integrated sampling technique, which incorporates both over-sampling and under-sampling, with an ensemble of SVMs to improve the prediction performance. Extensive experiments show that our method outperforms individual SVMs as well as several other state-of-the-art classifiers.

Keywords:	Data sampling Classification Imbalanced data mining
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏