Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2018, Article ID 5036710, 13 pages
https://doi.org/10.1155/2018/5036710
Research Article

Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method

Department of Computer Science, Taiyuan Normal University, Taiyuan 030012, China

Correspondence should be addressed to Jianhong Yan; moc.361@gnoh_naij_nay

Received 3 June 2017; Revised 1 October 2017; Accepted 6 November 2017; Published 23 January 2018

Academic Editor: Michele Migliore

Copyright © 2018 Jianhong Yan and Suqing Han. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. O. Loyola-González, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, and M. García-Borroto, “Effect of class imbalance on quality measures for contrast patterns: an experimental study,” Information Sciences, vol. 374, pp. 179–192, 2016. View at Publisher · View at Google Scholar · View at Scopus
  2. C. Beyan and R. Fisher, “Classifying imbalanced data sets using similarity based hierarchical decomposition,” Pattern Recognition, vol. 48, no. 5, pp. 1653–1672, 2015. View at Publisher · View at Google Scholar · View at Scopus
  3. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. V. C. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. View at Google Scholar
  5. X.-Y. Liu, J. X. Wu, and Z.-H. Zhou, “Exploratory under-sampling for class-imbalance learning,” in Proceedings of the 6th International Conference on Data Mining (ICDM '06), pp. 965–969, IEEE, Hong Kong, December 2006. View at Publisher · View at Google Scholar · View at Scopus
  6. N. Japkowicz, “The class imbalance problem: significance and strategies,” in Proceedings of the International Conference on Artificial Intelligence, 2000.
  7. B. Zadrozny and C. Elkan, “Learning and making decisions when costs and probabilities are both unknown,” in Proceedings of the the seventh ACM SIGKDD international conference, pp. 204–213, San Francisco, Calif, USA, August 2001. View at Publisher · View at Google Scholar
  8. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging, boosting, and hybrid-based approaches,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 42, no. 4, pp. 463–484, 2012. View at Publisher · View at Google Scholar · View at Scopus
  9. N. V. Chawla, D. A. Cieslak, L. O. Hall, and A. Joshi, “Automatically countering imbalance and its empirical relationship to cost,” Data Mining and Knowledge Discovery, vol. 17, no. 2, pp. 225–252, 2008. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  10. A. Freitas, A. Costapereira, and P. Brazdil, “Cost-Sensitive Decision Trees Applied to Medical Data,” in International Conference on Data Warehousing and Knowledge Discovery, pp. 303–312, 2007.
  11. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. View at Google Scholar · View at Scopus
  12. R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, 1990. View at Publisher · View at Google Scholar · View at Scopus
  13. Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, part 2, pp. 119–139, 1997. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  14. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: improving prediction of the minority class in boosting,” in Proceeding of Knowledge Discovery in Databases, vol. 2838, pp. 107–119. View at Publisher · View at Google Scholar
  15. C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, “RUSBoost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 40, no. 1, pp. 185–197, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. S. Hu, Y. Liang, L. Ma, and Y. He, “MSMOTE: Improving classification performance when training data is imbalanced,” in Proceedings of the 2nd International Workshop on Computer Science and Engineering, WCSE 2009, pp. 13–17, China, October 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 30–39, 2004. View at Publisher · View at Google Scholar
  18. S. Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” in Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, pp. 324–331, USA, April 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. R. Barandela, J. S. S'anchez, and R. M. Valdovinos, “New applications of ensembles of classifiers,” PAA. Pattern Analysis and Applications, vol. 6, no. 3, pp. 245–256, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  20. J. Błaszczyński, M. Deckert, J. Stefanowski, and S. Wilk, “Integrating selective pre-processing of imbalanced data with Ivotes ensemble,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 6086, pp. 148–157, 2010. View at Publisher · View at Google Scholar · View at Scopus
  21. W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “Adacost: Misclassication cost-sensitive boosting,” in the 6th Int. Conf. Mach. Learning, pp. 97–105, San Francisco, CA, USA, 1999.
  22. K. M. Ting, “A comparative study of cost-sensitive boosting algorithms,” in Proc. 17th Int. Conf. Mach. Learning, pp. 983–990, Stanford, CA, USA, 2000.
  23. M. Joshi, V. Kumar, and R. Agarwal, “Evaluating boosting algorithms to classify rare classes: comparison and improvements,” in Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 257–264, San Jose, CA, USA. View at Publisher · View at Google Scholar
  24. Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007. View at Publisher · View at Google Scholar · View at Scopus
  25. D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992. View at Publisher · View at Google Scholar · View at Scopus
  26. Y. Chen, M. L. Wong, and H. Li, “Applying Ant Colony Optimization to configuring stacking ensembles for data mining,” Expert Systems with Applications, vol. 41, no. 6, pp. 2688–2702, 2014. View at Publisher · View at Google Scholar · View at Scopus
  27. H. Kadkhodaei and A. M. E. Moghadam, “An entropy based approach to find the best combination of the base classifiers in ensemble classifiers based on stack generalization,” in Proceedings of the 4th International Conference on Control, Instrumentation, and Automation, ICCIA 2016, pp. 425–429, Iran, January 2016. View at Publisher · View at Google Scholar · View at Scopus
  28. I. Czarnowski and P. Jędrzejowicz, “An approach to machine classification based on stacked generalization and instance selection,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016, pp. 4836–4841, Hungary, 2017. View at Publisher · View at Google Scholar · View at Scopus
  29. S. Kotsiantis, “Stacking cost sensitive models,” in Proceedings of the 12th Pan-Hellenic Conference on Informatics, PCI 2008, pp. 217–221, Greece, August 2008. View at Publisher · View at Google Scholar · View at Scopus
  30. H. Y. Lo, J. C. Wang, H. M. Wang, and S. D. Lin, “Cost-sensitive stacking for audio tag annotation and retrieval,” in Proceedings of the 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 2308–2311, Czech Republic, May 2011. View at Publisher · View at Google Scholar · View at Scopus
  31. J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating learning algorithms,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 299–310, 2005. View at Publisher · View at Google Scholar · View at Scopus
  32. M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: one sided selection in,” in Proceedings of the 14th International Conference on Machine Learning (ICML'97, pp. 179–186, 1997.
  33. R. Batuwita and V. Palade, “Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning,” Journal of Bioinformatics and Computational Biology, vol. 10, no. 4, Article ID 1250003, 2012. View at Publisher · View at Google Scholar · View at Scopus
  34. L. Breiman, “Stacked regressions,” Machine Learning, vol. 24, no. 1, pp. 49–64, 1996. View at Publisher · View at Google Scholar · View at Scopus
  35. M. Leblanc and R. Tibshirani, “Combining estimates in regression and classification,” Journal of the American Statistical Association, vol. 91, no. 436, pp. 1641–1650, 1996. View at Publisher · View at Google Scholar · View at MathSciNet
  36. B. Cestnik, “Estimating Probabilities: A Crucial Task in Machine Learning,” in Proceedings of the European Conference on Artificial Intelligence, pp. 147–149, 1990.
  37. J. R. Quinlan, C4.5: Program for machine learning, Morgan Kaufmann, 1993.
  38. N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992. View at Publisher · View at Google Scholar · View at MathSciNet
  39. J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986. View at Publisher · View at Google Scholar · View at Scopus
  40. C. Elkan, “The foundations of cost-sensitive learning,” in Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI '01), pp. 973–978, New York, NY, USA, August 2001. View at Scopus
  41. D. Michie, D. Spiegel halter J, C. Taylor et al., Machine learning, neural and statistical classification, Ellis Horwood, neural and statistical classification, 1995.
  42. K. M. Ting and I. F. Witten, “Issues in stacked generalization,” Artif. Intell. Res, vol. 10, no. 1, pp. 271–289, 1999. View at Google Scholar
  43. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–18, 2009. View at Publisher · View at Google Scholar
  44. A. Frank and A. Asuncion, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, USA, http://archive.ics.uci.edu/ml.
  45. J. Alcalá-Fdez, A. Fernández, J. Luengo et al., “KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011. View at Google Scholar · View at Scopus
  46. S. García, A. Fernández, J. Luengo, and F. Herrera, “A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability,” Soft Computing, vol. 13, no. 10, pp. 959–977, 2009. View at Publisher · View at Google Scholar · View at Scopus
  47. S. Holm, “A simple sequentially rejective multiple test procedure,” Scandinavian Journal of Statistics, vol. 6, no. 2, pp. 65–70, 1979. View at Google Scholar · View at MathSciNet