Table of Contents Author Guidelines Submit a Manuscript
Applied Computational Intelligence and Soft Computing
Volume 2016, Article ID 7658207, 12 pages
http://dx.doi.org/10.1155/2016/7658207
Research Article

Prediction of Defective Software Modules Using Class Imbalance Learning

Indian Institute of Information Technology, No. 5203, CC-3 Building, Allahabad, Uttar Pradesh 211012, India

Received 17 November 2015; Accepted 19 January 2016

Academic Editor: Zhang Yi

Copyright © 2016 Divya Tomar and Sonali Agarwal. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. N. Fenton and J. Bieman, Software Metrics: A Rigorous and Practical Approach, CRC Press, Boca Raton, Fla, USA, 2014.
  2. N. E. Fenton and M. Neil, “Software metrics: roadmap,” in Proceedings of the Conference on the Future of Software Engineering (ICSE '00), pp. 357–370, ACM, Limerick, Ireland, June 2000. View at Publisher · View at Google Scholar
  3. A. G. Koru and H. Liu, “Building effective defect-prediction models in practice,” IEEE Software, vol. 22, no. 6, pp. 23–29, 2005. View at Publisher · View at Google Scholar · View at Scopus
  4. C. Catal and B. Diri, “A systematic review of software fault prediction studies,” Expert Systems with Applications, vol. 36, no. 4, pp. 7346–7354, 2009. View at Publisher · View at Google Scholar · View at Scopus
  5. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1276–1304, 2012. View at Publisher · View at Google Scholar · View at Scopus
  6. E. Arisholm, L. C. Briand, and E. B. Johannessen, “A systematic and comprehensive investigation of methods to build and evaluate fault prediction models,” Journal of Systems and Software, vol. 83, no. 1, pp. 2–17, 2010. View at Publisher · View at Google Scholar · View at Scopus
  7. S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Transactions on Reliability, vol. 62, no. 2, pp. 434–443, 2013. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Zheng, “Cost-sensitive boosting neural networks for software defect prediction,” Expert Systems with Applications, vol. 37, no. 6, pp. 4537–4543, 2010. View at Publisher · View at Google Scholar · View at Scopus
  9. Y. Kamei, A. Monden, S. Matsumoto, T. Kakimoto, and K.-I. Matsumoto, “The effects of over and under sampling on fault-prone module detection,” in Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement (ESEM '07), pp. 196–204, Madrid, Spain, September 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. T. M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute selection and imbalanced data: problems in software defect prediction,” in Proceedings of the 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI '10), pp. 137–144, IEEE, Arras, France, October 2010. View at Publisher · View at Google Scholar · View at Scopus
  11. J. C. Riquelme, R. Ruiz, D. Rodríguez, and J. Moreno, “Finding defective modules from highly unbalanced datasets,” Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos, vol. 2, no. 1, pp. 67–74, 2008. View at Google Scholar
  12. H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. View at Publisher · View at Google Scholar · View at Scopus
  13. Jayadeva, R. Khemchandani, and S. Chandra, “Twin support vector machines for pattern classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 5, pp. 905–910, 2007. View at Publisher · View at Google Scholar · View at Scopus
  14. Y. Tian, Z. Qi, X. Ju, Y. Shi, and X. Liu, “Nonparallel support vector machines for pattern classification,” IEEE Transactions on Cybernetics, vol. 44, no. 7, pp. 1067–1079, 2014. View at Publisher · View at Google Scholar · View at Scopus
  15. D. Tomar and S. Agarwal, “Twin support vector machine: a review from 2007 to 2014,” Egyptian Informatics Journal, vol. 16, no. 1, pp. 55–69, 2015. View at Publisher · View at Google Scholar
  16. O. L. Mangasarian and E. W. Wild, “Multisurface proximal support vector machine classification via generalized eigenvalues,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 1, pp. 69–74, 2006. View at Publisher · View at Google Scholar · View at Scopus
  17. M. Arun Kumar and M. Gopal, “Least squares twin support vector machines for pattern classification,” Expert Systems with Applications, vol. 36, no. 4, pp. 7535–7543, 2009. View at Publisher · View at Google Scholar · View at Scopus
  18. Y. Sun, A. K. C. Wong, and M. S. Kamel, “Classification of imbalanced data: a review,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no. 4, pp. 687–719, 2009. View at Publisher · View at Google Scholar · View at Scopus
  19. S. Kotsiantis, D. Kanellopoulos, and P. Pintelas, “Handling imbalanced datasets: a review,” GESTS International Transactions on Computer Science and Engineering, vol. 30, no. 1, pp. 25–36, 2006. View at Google Scholar
  20. G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20–29, 2004. View at Publisher · View at Google Scholar
  21. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. View at Google Scholar · View at Scopus
  22. A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method for learning from imbalanced data sets,” Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  23. Y. Liu, X. Yu, J. X. Huang, and A. An, “Combining integrated sampling with SVM ensembles for learning from imbalanced datasets,” Information Processing and Management, vol. 47, no. 4, pp. 617–631, 2011. View at Publisher · View at Google Scholar · View at Scopus
  24. X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 2, pp. 539–550, 2009. View at Publisher · View at Google Scholar · View at Scopus
  25. C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem,” in Advances in Knowledge Discovery and Data Mining, pp. 475–482, Springer, Berlin, Germany, 2009. View at Google Scholar
  26. S.-J. Yen and Y.-S. Lee, “Cluster-based under-sampling approaches for imbalanced data distributions,” Expert Systems with Applications, vol. 36, no. 3, pp. 5718–5727, 2009. View at Publisher · View at Google Scholar · View at Scopus
  27. S. Chen, G. Guo, and L. Chen, “A new over-sampling method based on cluster ensembles,” in Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA '10), pp. 599–604, IEEE, Perth, Australia, April 2010. View at Publisher · View at Google Scholar · View at Scopus
  28. L. Mena and J. A. Gonzalez, “Symbolic one-class learning from imbalanced datasets: application in medical diagnosis,” International Journal on Artificial Intelligence Tools, vol. 18, no. 2, pp. 273–309, 2009. View at Publisher · View at Google Scholar · View at Scopus
  29. B. X. Wang and N. Japkowicz, “Boosting support vector machines for imbalanced data sets,” Knowledge and Information Systems, vol. 25, no. 1, pp. 1–20, 2010. View at Publisher · View at Google Scholar · View at Scopus
  30. M. A. Maloof, “Learning when data sets are imbalanced and when costs are unequal and unknown,” in Proceedings of the Workshop on Learning from Imbalanced Data Sets II (ICML '03), vol. 2, pp. 1–2, Washington, DC, USA, 2003.
  31. M. Kukar and I. Kononenko, “Cost-sensitive learning with neural networks,” in Proceedings of the 13th European Conference on Artificial Intelligence (ECAI '98), 1998, pp. 445–449.
  32. W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: misclassification cost-sensitive boosting,” in Proceedings of the 16th International Conference on Machine Learning (ICML '99), pp. 97–105, Bled, Slovenia, June 1999.
  33. D. Tomar and S. Agarwal, “An effective weighted multi-class least squares twin support vector machine for imbalanced data classification,” International Journal of Computational Intelligence Systems, vol. 8, no. 4, pp. 761–778, 2015. View at Publisher · View at Google Scholar
  34. S. Kanmani, V. R. Uthariaraj, V. Sankaranarayanan, and P. Thambidurai, “Object-oriented software fault prediction using neural networks,” Information and Software Technology, vol. 49, no. 5, pp. 483–492, 2007. View at Publisher · View at Google Scholar · View at Scopus
  35. T. M. Khoshgoftaar, E. B. Allen, W. D. Jones, and J. P. Hudepohl, “Classification tree models of software quality over multiple releases,” in Proceedings of the 10th International Symposium on Software Reliability Engineering (ISSRE '99), pp. 116–125, Boca Raton, Fla, USA, November 1999. View at Scopus
  36. R. W. Selby and A. A. Porter, “Learning from examples: generation and evaluation of decision trees for software resource analysis,” IEEE Transactions on Software Engineering, vol. 14, no. 12, pp. 1743–1757, 1988. View at Publisher · View at Google Scholar · View at Scopus
  37. K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” Journal of Systems and Software, vol. 81, no. 5, pp. 649–660, 2008. View at Publisher · View at Google Scholar · View at Scopus
  38. G. Czibula, Z. Marian, and I. G. Czibula, “Software defect prediction using relational association rule mining,” Information Sciences, vol. 264, pp. 260–278, 2014. View at Publisher · View at Google Scholar · View at Scopus
  39. L. Guo, Y. Ma, B. Cukic, and H. Singh, “Robust prediction of fault-proneness by random forests,” in Proceedings of the 15th International Symposium on Software Reliability Engineering (ISSRE '04), pp. 417–428, Saint-Malo, France, November 2004. View at Publisher · View at Google Scholar · View at Scopus
  40. S. Agarwal, D. Tomar, and Siddhant, “Prediction of software defects using twin support vector machine,” in Proceedings of the International Conference on Information Systems and Computer Networks (ISCON '14), pp. 128–132, IEEE, Mathura, India, March 2014. View at Publisher · View at Google Scholar · View at Scopus
  41. V. U. B. Challagulla, F. B. Bastani, I.-L. Yen, and R. A. Paul, “Empirical assessment of machine learning based software defect prediction techniques,” International Journal on Artificial Intelligence Tools, vol. 17, no. 2, pp. 389–400, 2008. View at Publisher · View at Google Scholar · View at Scopus
  42. J. Moeyersoms, E. Junqué De Fortuny, K. Dejaeger, B. Baesens, and D. Martens, “Comprehensible software fault and effort prediction: a data mining approach,” Journal of Systems and Software, vol. 100, pp. 80–90, 2015. View at Publisher · View at Google Scholar · View at Scopus
  43. A. Okutan and O. T. Yıldız, “Software defect prediction using Bayesian networks,” Empirical Software Engineering, vol. 19, no. 1, pp. 154–181, 2014. View at Publisher · View at Google Scholar · View at Scopus
  44. S. Amasaki, Y. Takagi, O. Mizuno, and T. Kikuno, “A Bayesian belief network for assessing the likelihood of fault content,” in Proceedings of the 14th IEEE International Symposium on Software Reliability Engineering (ISSRE '03), pp. 215–226, IEEE, Denver, Colo, USA, November 2003. View at Publisher · View at Google Scholar
  45. S. Bibi and I. Stamelos, “Software process modeling with Bayesian belief networks,” in Proceedings of the 10th International Software Metrics Symposium (METRICS '04), vol. 14, p. 16, Chicago, Ill, USA, September 2004.
  46. K. Dejaeger, T. Verbraken, and B. Baesens, “Toward comprehensible software fault prediction models using bayesian network classifiers,” IEEE Transactions on Software Engineering, vol. 39, no. 2, pp. 237–257, 2013. View at Publisher · View at Google Scholar · View at Scopus
  47. G. J. Pai and J. B. Dugan, “Empirical analysis of software fault content and fault proneness using Bayesian methods,” IEEE Transactions on Software Engineering, vol. 33, no. 10, pp. 675–686, 2007. View at Publisher · View at Google Scholar · View at Scopus
  48. N. Fenton, M. Neil, and D. Marquez, “Using Bayesian networks to predict software defects and reliability,” Proceedings of the Institution of Mechanical Engineers, Part O, vol. 222, no. 4, pp. 701–712, 2008. View at Publisher · View at Google Scholar
  49. C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Information Sciences, vol. 179, no. 8, pp. 1040–1058, 2009. View at Publisher · View at Google Scholar · View at Scopus
  50. M. Evett, T. Khoshgoftar, P. D. Chien, and E. Allen, “GP-based software quality prediction,” in Proceedings of the 3rd Annual Genetic Programming Conference (GP '98), pp. 60–65, Madison, Wis, USA, July 1998.
  51. A. B. de Carvalho, A. Pozo, and S. R. Vergilio, “A symbolic fault-prediction model based on multiobjective particle swarm optimization,” Journal of Systems and Software, vol. 83, no. 5, pp. 868–882, 2010. View at Publisher · View at Google Scholar · View at Scopus
  52. O. Vandecruys, D. Martens, B. Baesens, C. Mues, M. De Backer, and R. Haesen, “Mining software repositories for comprehensible software fault prediction models,” Journal of Systems and Software, vol. 81, no. 5, pp. 823–839, 2008. View at Publisher · View at Google Scholar · View at Scopus
  53. Ö. F. Arar and K. Ayan, “Software defect prediction using cost-sensitive neural network,” Applied Soft Computing, vol. 33, pp. 263–277, 2015. View at Publisher · View at Google Scholar
  54. T. M. Khoshgoftaar and K. Gao, “Feature selection with imbalanced data for software defect prediction,” in Proceedings of the 8th International Conference on Machine Learning and Applications (ICMLA '09), pp. 235–240, Miami Beach, Fla, USA, December 2009. View at Publisher · View at Google Scholar · View at Scopus
  55. X. Y. Jing, S. Ying, Z. W. Zhang, S. S. Wu, and J. Liu, “Dictionary learning based software defect prediction,” in Proceedings of the 36th International Conference, pp. 414–423, ACM, Hyderabad, India, May 2014. View at Publisher · View at Google Scholar
  56. S. Agarwal and D. Tomar, “A feature selection based model for software defect prediction,” International Journal of Advanced Science and Technology, vol. 65, pp. 39–58, 2014. View at Publisher · View at Google Scholar
  57. J. C. Munson and T. M. Khoshgoftaar, “The detection of fault-prone programs,” IEEE Transactions on Software Engineering, vol. 18, no. 5, pp. 423–433, 1992. View at Publisher · View at Google Scholar · View at Scopus
  58. S. S. Rathore and A. Gupta, “A comparative study of feature-ranking and feature-subset selection techniques for improved fault prediction,” in Proceedings of the 7th India Software Engineering Conference (ISEC '14), p. 7, ACM, Chennai, India, February 2014. View at Publisher · View at Google Scholar · View at Scopus
  59. I. Chawla and S. K. Singh, “An automated approach for bug categorization using fuzzy logic,” in Proceedings of the the 8th India Software Engineering Conference, pp. 90–99, ACM, Bangalore, India, Feburary 2015. View at Publisher · View at Google Scholar
  60. K. Muthukumaran, A. Rallapalli, and N. L. Murthy, “Impact of feature selection techniques on bug prediction models,” in Proceedings of the 8th India Software Engineering Conference, pp. 120–129, ACM, Bangalore, India, Feburary 2015. View at Publisher · View at Google Scholar
  61. M. D'Ambros, M. Lanza, and R. Robbes, “An extensive comparison of bug prediction approaches,” in Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR '10), pp. 31–41, Cape Town, South Africa, May 2010. View at Publisher · View at Google Scholar
  62. E. Erturk and E. A. Sezer, “A comparison of some soft computing methods for software fault prediction,” Expert Systems with Applications, vol. 42, no. 4, pp. 1872–1879, 2015. View at Publisher · View at Google Scholar · View at Scopus
  63. Software Defect Dataset, Promise Repository, May 2015, http://promise.site.uottawa.ca/SERepository/datasets-page.html.
  64. M. Kubat and S. Matwin, “Addressing the curse of imbalanced training sets: one-sided selection,” in Proceedings of the 14th International Conference on Machine Learning (ICML '97), vol. 97, pp. 179–186, Banff, Canada, 1997.
  65. A. Fernández, V. López, M. Galar, M. J. Del Jesus, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches,” Knowledge-Based Systems, vol. 42, pp. 97–110, 2013. View at Publisher · View at Google Scholar · View at Scopus
  66. J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006. View at Google Scholar · View at MathSciNet
  67. D. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, Chapman & Hall/CRC, 2nd edition, 2006.