Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2014, Article ID 358942, 14 pages
http://dx.doi.org/10.1155/2014/358942
Research Article

A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China

Received 22 November 2013; Revised 23 January 2014; Accepted 14 February 2014; Published 30 March 2014

Academic Editor: Panos Liatsis

Copyright © 2014 Qing-Yan Yin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. Z.-B. Zhu and Z.-H. Song, “Fault diagnosis based on imbalance modified kernel fisher discriminant analysis,” Chemical Engineering Research and Design, vol. 88, no. 8, pp. 936–951, 2010. View at Publisher · View at Google Scholar · View at Scopus
  2. W. Khreich, E. Granger, A. Miri, and R. Sabourin, “Iterative Boolean combination of classifiers in the ROC space: an application to anomaly detection with HMMs,” Pattern Recognition, vol. 43, no. 8, pp. 2732–2752, 2010. View at Publisher · View at Google Scholar · View at Scopus
  3. M. A. Mazurowski, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Baker, and G. D. Tourassi, “Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance,” Neural Networks, vol. 21, no. 2-3, pp. 427–436, 2008. View at Publisher · View at Google Scholar · View at Scopus
  4. N. Garcia-Pedrajas, J. Perez-Rodriguez, M. Garcia-Pedrajas, D. Ortiz-Boyer, and C. Fyfe, “Class imbalance methods for translation initiation site recognition in DNA sequences,” Knowledge-Based Systems, vol. 25, no. 1, pp. 22–34, 2012. View at Publisher · View at Google Scholar · View at Scopus
  5. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. View at Publisher · View at Google Scholar · View at Scopus
  6. T. M. Khoshgoftaar, J. van Hulse, and A. Napolitano, “Comparing boosting and bagging techniques with noisy and imbalanced data,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 41, no. 3, pp. 552–568, 2011. View at Publisher · View at Google Scholar · View at Scopus
  7. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 42, no. 4, pp. 463–484, 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. N. V. Chawla, K. W. Bowyer, L. O. Hall et al., “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. View at Google Scholar
  9. A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method for learning from imbalanced data sets,” Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004. View at Google Scholar · View at MathSciNet · View at Scopus
  10. Y. Sun, M. S. Kamel, A. K. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007. View at Publisher · View at Google Scholar · View at Scopus
  11. G. Wu and E. Chang, “KBA: kernel boundary alignment considering imbalanced data distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 786–795, 2005. View at Publisher · View at Google Scholar · View at Scopus
  12. N. V. Chawla, D. Cieslak, L. O. Hall, and A. Joshi, “Automatically countering imbalance and its empirical relationship to cost,” Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 225–252, 2008. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  13. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: improving prediction of the minority class in boosting,” in Knowledge Discovery in Databases, pp. 107–119, 2003. View at Google Scholar
  14. C. Seiffert, T. Khoshgoftaar, J. van Hulse, and A. Napolitano, “RUSBoost: a hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 40, no. 1, pp. 185–197, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. S. Chen, H. He, and E. A. Garcia, “RAMOBoost: ranked minority oversampling in boosting,” IEEE Transactions on Neural Networks, vol. 21, no. 10, pp. 1624–1642, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. R. Barandela, R. M. Valdovinos, and J. S. Sanchez, “New applications of ensembles of classifiers,” Pattern Analysis and Applications, vol. 6, no. 3, pp. 245–256, 2003. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  17. S. Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” in Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM '09), pp. 324–331, 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 2, pp. 539–550, 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: a new classifier ensemble method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619–1630, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. N. Li, Y. Yu, and Z. H. Zhou, “Diversity regularized ensemble pruning,” in Proceedings of the 23rd European Conference on Machine Learning, pp. 330–345, 2012.
  21. M. Galar, A. Fernandez, and E. Barrenechea, “EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling,” Pattern Recognition, vol. 46, no. 12, pp. 3460–3471, 2013. View at Publisher · View at Google Scholar
  22. Y. Freund, R. Schapire, and N. Abe, “A short introduction to boosting,” Journal-Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp. 771–780, 1999. View at Google Scholar
  23. Q. Y. Yin, J. S. Zhang, C. X. Zhang et al., “An empirical study on the performance of cost-sensitive boosting algorithms with different levels of class imbalance,” Mathematical Problems in Engineering, vol. 2013, Article ID 761814, 12 pages, 2013. View at Publisher · View at Google Scholar
  24. G. I. Webb, “MultiBoosting: a technique for combining boosting and wagging,” Machine Learning, vol. 40, no. 2, pp. 159–196, 2000. View at Publisher · View at Google Scholar · View at Scopus
  25. S. Garcia and F. Herrera, “Evolutionary under-sampling for classification with imbalanced datasets: proposals and taxonomy,” Evolutionary Computation, vol. 17, no. 3, pp. 275–306, 2009. View at Publisher · View at Google Scholar · View at Scopus
  26. L. Cerf, D. Gay, F. N. Selmaoui, B. Crémilleux, and J.-F. Boulicaut, “Parameter-free classification in multiclass imbalanced data sets,” Data and Knowledge Engineering, vol. 87, pp. 109–129, 2013. View at Publisher · View at Google Scholar
  27. S. Wang and X. Yao, “Multiclass imbalance problems: analysis and potential solutions,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 4, pp. 1119–1130, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. M. Lin, K. Tang, and X. Yao, “Dynamic sampling approach to training neural networks for multiclass imbalance classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 4, pp. 647–660, 2013. View at Google Scholar
  29. K. Chen and S. Wang, “Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 129–143, 2011. View at Publisher · View at Google Scholar · View at Scopus
  30. M. Frasca, A. Bertoni, M. Re, and G. Valentini, “A neural network algorithm for semi-supervised node label learning from unbalanced data,” Neural Networks, vol. 43, pp. 84–94, 2013. View at Google Scholar