Table of Contents Author Guidelines Submit a Manuscript
Abstract and Applied Analysis
Volume 2014 (2014), Article ID 972786, 7 pages
http://dx.doi.org/10.1155/2014/972786
Research Article

A Hybrid Sampling SVM Approach to Imbalanced Data Classification

College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China

Received 17 April 2014; Accepted 22 May 2014; Published 12 June 2014

Academic Editor: Fuding Xie

Copyright © 2014 Qiang Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Z.-H. Zhou and X.-Y. Liu, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63–77, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. H. B. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. View at Publisher · View at Google Scholar · View at Scopus
  3. B. X. Wang and N. Japkowicz, “Boosting support vector machines for imbalanced data sets,” Knowledge and Information Systems, vol. 25, no. 1, pp. 1–20, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.
  5. M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: one-sided selection,” in Proceedings of the International Conference on Machine Learning, pp. 179–186, Nashville, Tenn, USA, 1997.
  6. N. Japkowicz and S. Stephen, “The class imbalance problem: a systematic study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429–450, 2002. View at Google Scholar
  7. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 39, no. 2, pp. 539–550, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. M. S. Kim, “An effective under-sampling method for class imbalance data problem,” in Proceedings of the 8th International Symposium on Advanced Intelligent Systems (ISIS '07), pp. 825–829, Sokcho-City, Republic of Korea, 2007.
  9. S. García and F. Herrera, “Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy,” Evolutionary Computation, vol. 17, no. 3, pp. 275–306, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. S.-J. Yen and Y.-S. Lee, “Cluster-based under-sampling approaches for imbalanced data distributions,” Expert Systems with Applications, vol. 36, no. 3, pp. 5718–5727, 2009. View at Publisher · View at Google Scholar · View at Scopus
  11. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321–357, 2002. View at Google Scholar · View at Scopus
  12. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: improving prediction of the minority class in boosting,” in Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD '03), pp. 107–119, Cavtat, Croatia, September 2003. View at Scopus
  13. H. Han, W. Wang, and B. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,” in Proceedings of the International Conference on Intelligent Computing (ICIC '05), vol. 3644 of Lecture Notes in Computer Science, pp. 878–887, Hefei, China, August 2005. View at Scopus
  14. M. Gao, X. Hong, S. Chen, and C. J. Harris, “Probability density function estimation based over-sampling for imbalanced two-class problems,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '12), Brisbane, Australia, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  15. H. X. Zhang and M. F. Li, “RWO-Sampling: a random walk over-sampling approach to imbalanced data classification,” Information Fusion, vol. 20, pp. 99–116, 2014. View at Publisher · View at Google Scholar
  16. Y. X. Peng and J. Yao, “AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets,” in Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval (MIR '10), pp. 111–118, Philadelphia, Pa, USA, March 2010. View at Publisher · View at Google Scholar · View at Scopus
  17. S. Cateni, V. Colla, and M. Vannucci, “A method for resampling imbalanced datasets in binary classification tasks for real-world problems,” Neurocomputing, vol. 135, pp. 32–41, 2014. View at Publisher · View at Google Scholar
  18. P. Cao, J. Z. Yang, W. Li, D. Z. Zhao, and O. Zaiane, “Ensemble-based hybrid probabilistic sampling for imbalanced datalearning in lung nodule CAD,” Computerized Medical Imaging and Graphics, vol. 38, no. 3, pp. 137–150, 2014. View at Publisher · View at Google Scholar
  19. J. Luengo, A. Fernández, S. García, and F. Herrera, “Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling,” Soft Computing, vol. 15, no. 10, pp. 1909–1936, 2011. View at Publisher · View at Google Scholar · View at Scopus
  20. Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  21. J. P. Hwang, S. Park, and E. Kim, “A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function,” Expert Systems with Applications, vol. 38, no. 7, pp. 8580–8585, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. S. Z. Wang, Z. J. Li, W. H. Chao, and Q. H. Cao, “Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '12), Brisbane, Australia, June 2012. View at Publisher · View at Google Scholar · View at Scopus
  23. R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector machines to imbalanced datasets,” in Proceedings of the 15th European Conference on Machine Learning (ECML '04), pp. 39–50, Pisa, Italy, September 2004. View at Scopus
  24. Y. Tang, Y.-Q. Zhang, and N. V. Chawla, “SVMs modeling for highly imbalanced classification,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 39, no. 1, pp. 281–288, 2009. View at Publisher · View at Google Scholar · View at Scopus
  25. R. Batuwita and V. Palade, “FSVM-CIL: fuzzy support vector machines for class imbalance learning,” IEEE Transactions on Fuzzy Systems, vol. 18, no. 3, pp. 558–571, 2010. View at Publisher · View at Google Scholar · View at Scopus
  26. Y. H. Shao, W. J. Chen, J. J. Zhang, Z. Wang, and N. Y. Deng, “An efficient weighted Lagrangian twin support vector machine for imbalanced data classification,” Pattern Recognition, vol. 47, no. 9, pp. 3158–3167, 2014. View at Publisher · View at Google Scholar
  27. J. H. Fu and S. L. Lee, “Certainty-based active learning for sampling imbalanced datasets,” Neurocomputing, vol. 119, pp. 350–358, 2013. View at Publisher · View at Google Scholar · View at Scopus
  28. S.-H. Oh, “Error back-propagation algorithm for classification of imbalanced data,” Neurocomputing, vol. 74, no. 6, pp. 1058–1061, 2011. View at Publisher · View at Google Scholar · View at Scopus
  29. B. Krawczyk, M. Wozniak, and G. Schaefer, “Cost-sensitive decision tree ensembles for effective imbalanced classification,” Applied Soft Computing, vol. 14, pp. 554–562, 2014. View at Publisher · View at Google Scholar
  30. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man and Cybernetics C: Applications and Reviews, vol. 42, no. 4, pp. 463–484, 2012. View at Publisher · View at Google Scholar · View at Scopus
  31. Y. Liu, X. H. Yu, J. X. Huang, and A. J. An, “Combining integrated sampling with SVM ensembles for learning from imbalanced datasets,” Information Processing & Management, vol. 47, no. 4, pp. 617–631, 2011. View at Publisher · View at Google Scholar · View at Scopus
  32. H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach,” SIGKDD Explorations, vol. 6, no. 1, pp. 30–39, 2004. View at Publisher · View at Google Scholar
  33. S. Oh, M. S. Lee, and B. T. Zhang, “Ensemble learning with active example selection for imbalanced biomedical data classification,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 316–325, 2011. View at Publisher · View at Google Scholar · View at Scopus
  34. M. Woźniak, M. Grana, and E. Corchado, “A survey of multiple classifier systems as hybrid systems,” Information Fusion, vol. 16, pp. 3–17, 2014. View at Publisher · View at Google Scholar
  35. B. E. Boser, I. M. Guyon, and V. N. Vapnik, “Training algorithm for optimal margin classifiers,” in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152, Pittsburgh, Pa, USA, July 1992. View at Scopus
  36. P. Flach, “ROC analysis for ranking and probability estimation,” Notes from the UAI tutorial on ROC analysis, 2007.
  37. J. Alcalá-Fdez, A. Fernández, J. Luengo et al., “KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011. View at Google Scholar · View at Scopus