Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2013 (2013), Article ID 761814, 12 pages
http://dx.doi.org/10.1155/2013/761814
Research Article

An Empirical Study on the Performance of Cost-Sensitive Boosting Algorithms with Different Levels of Class Imbalance

School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China

Received 10 May 2013; Accepted 13 August 2013

Academic Editor: Reza Jazar

Copyright © 2013 Qing-Yan Yin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. G. Cohen, M. Hilario, H. Sax, S. Hugonnet, and A. Geissbuhler, “Learning from imbalanced data in surveillance of nosocomial infection,” Artificial Intelligence in Medicine, vol. 37, no. 1, pp. 7–18, 2006. View at Publisher · View at Google Scholar · View at Scopus
  2. P. K. Chan, W. Fan, A. L. Prodromidis, and S. J. Stolfo, “Distributed data mining in credit card fraud detection,” IEEE Intelligent Systems and Their Applications, vol. 14, no. 6, pp. 67–74, 1999. View at Publisher · View at Google Scholar · View at Scopus
  3. Z.-B. Zhu and Z.-H. Song, “Fault diagnosis based on imbalance modified kernel Fisher discriminant analysis,” Chemical Engineering Research and Design, vol. 88, no. 8, pp. 936–951, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. Z. Zheng, X. Wu, and R. Srihari, “Feature selection for text categorization on imbalanced data,” SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 80–89, 2004. View at Publisher · View at Google Scholar
  5. N. García-Pedrajas, J. Pérez-Rodríguez, M. García-Pedrajas, D. Ortiz-Boyer, and C. Fyfe, “Class imbalance methods for translation initiation site recognition in DNA sequences,” Knowledge-Based Systems, vol. 25, no. 1, pp. 22–34, 2012. View at Publisher · View at Google Scholar · View at Scopus
  6. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. View at Publisher · View at Google Scholar · View at Scopus
  7. V. López, A. Fernández, J. G. Moreno-Torres, and F. Herrera, “Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics,” Expert Systems with Applications, vol. 39, no. 7, pp. 6585–6608, 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man and Cybernetics C, vol. 42, no. 4, pp. 463–484, 2012. View at Publisher · View at Google Scholar · View at Scopus
  9. J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating learning algorithms,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 299–310, 2005. View at Publisher · View at Google Scholar · View at Scopus
  10. R. Ranawana and V. Palade, “Optimized precision—a new measure for classifier performance evaluation,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '06), pp. 2254–2261, July 2006. View at Scopus
  11. T. K. Jo and N. Japkowicz, “Class imbalances versus small disjuncts,” SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 40–49, 2004. View at Publisher · View at Google Scholar
  12. R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard, “Learning with class skews and small disjuncts,” in Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, pp. 296–306, 2004. View at Zentralblatt MATH
  13. R. Barandela, R. M. Valdovinos, J. S. Snchez, and F. J. Ferri, “The imbalance training sample problem:under or over sampling,” in Structural, Syntactic, and Statistical Pattern Recognition, pp. 806–814, Springer, 2004. View at Google Scholar
  14. A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method for learning from imbalanced data sets,” Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  15. A. Estabrooks, A combination scheme for inductive learning from imbalanced data sets [M.S. thesis], Faculty of Computer Science, Dalhousie University, Nova Scotia, Canada, 2000.
  16. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. View at Google Scholar · View at Scopus
  17. J. Van Hulse, T. M. Khoshgoftaar, and A. Napolitano, “Experimental perspectives on learning from imbalanced data,” in Proceedings of the 24th International Conference on Machine Learning (ICML '07), pp. 935–942, June 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. V. García, J. S. Sánchez, and R. A. Mollineda, “On the effectiveness of preprocessing methods when dealing with different levels of class imbalance,” Knowledge-Based Systems, vol. 25, no. 1, pp. 13–21, 2012. View at Publisher · View at Google Scholar · View at Scopus
  19. R. Barandela, J. S. Sánchez, V. García, and E. Rangel, “Strategies for learning in class imbalance problems,” Pattern Recognition, vol. 36, no. 3, pp. 849–851, 2003. View at Publisher · View at Google Scholar · View at Scopus
  20. G. Wu and E. Y. Chang, “KBA: kernel boundary alignment considering imbalanced data distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 786–795, 2005. View at Publisher · View at Google Scholar · View at Scopus
  21. C. Elkan, “The foundations of cost-sensitive learning,” in Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 973–978, Seattle, Wash, USA, 2001.
  22. K. M. Ting, “An instance-weighting method to induce cost-sensitive trees,” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 659–665, 2002. View at Publisher · View at Google Scholar · View at Scopus
  23. B. Zadrozny, J. Langford, and N. Abe, “Cost-sensitive learning by cost-proportionate example weighting,” in Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM '03), pp. 435–442, Melbourne, Fla, USA, November 2003. View at Scopus
  24. Z.-H. Zhou and X.-Y. Liu, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63–77, 2006. View at Publisher · View at Google Scholar · View at Scopus
  25. W. Jing, D. Meng, C. Qiao, and Z. Peng, “Eliminating vertical stripe defects on silicon steel surface by L1/2 regularization,” Mathematical Problems in Engineering, vol. 2011, Article ID 854674, 13 pages, 2011. View at Publisher · View at Google Scholar · View at Scopus
  26. C. L. Castro and A. de Pádua Braga, “Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 6, pp. 888–899, 2013. View at Publisher · View at Google Scholar
  27. C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, “RUSBoost: a hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics A, vol. 40, no. 1, pp. 185–197, 2010. View at Publisher · View at Google Scholar · View at Scopus
  28. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 39, no. 2, pp. 539–550, 2009. View at Publisher · View at Google Scholar · View at Scopus
  29. W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “Adacost: misclassification cost-sensitive boosting,” in Proceedings of the 6th Internatinal Conference on Machine Learning, pp. 97–105, Tahoe City, Calif, USA, 1999.
  30. K. M. Ting, “A comparative study of cost-sensitive boosting algorithms,” in Proceedings of the 17th International Conference on Machine Learning, pp. 983–990, San Francisco, Calif, USA, 2000.
  31. Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at Scopus
  32. H. Masnadi-Shirazi and N. Vasconcelos, “Cost-sensitive boosting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 294–309, 2011. View at Publisher · View at Google Scholar · View at Scopus
  33. X. Wang, S. Matwin, N. Japkowicz, and X. Liu, “Cost-sensitive boosting algorithms for imbalancedmulti-instance datasets,” Advances in Artificial Intelligence, vol. 7884, pp. 174–186, 2013. View at Publisher · View at Google Scholar
  34. Y. Zhang and D. P. Wang, “A cost-sensitive ensemble method for class-imbalanced datasets,” Abstract and Applied Analysis, vol. 2013, Article ID 196256, 6 pages, 2013. View at Publisher · View at Google Scholar
  35. Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, part 2, pp. 119–139, 1997. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  36. X. Wu, V. Kumar, Q. J. Ross et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008. View at Publisher · View at Google Scholar · View at Scopus
  37. R. Caruana and A. Niculescu-Mizil, “Data mining in metric space: an empirical analysis of supervised learning performance criteria,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), pp. 69–78, Seattle, Wash, USA, August 2004. View at Scopus
  38. R. Alaiz-Rodriguez, N. Japkowicz, and P. Tischer, “A visualization-based exploratory techniquefor classifier comparison with respect to multiple metrics and multiple domains,” in Proceedings of the 15th European Conference on Machine Learning, pp. 660–665, Antwerp, Belgium, 2008.
  39. Y.-F. Li, J. T. Kwok, and Z.-H. Zhou, “Cost-sensitive semi-supervised support vector machine,” in Proceedings of the 24th AAAI Conference on Artificial Intelligence and the 22nd Innovative Applications of Artificial Intelligence Conference (IAAI '10), pp. 500–505, July 2010. View at Scopus
  40. V. García, J. S. Sánchez, and R. A. Mollineda, “Theoretical analysis of a performance measure for imbalanced data,” in Proceedings of the 20th International Conference on Pattern Recognition (ICPR '10), pp. 617–620, Istanbul, Turkey, August 2010. View at Publisher · View at Google Scholar · View at Scopus