e-Informatica Software Engineering Journal Examining the Predictive Capability of Advanced Software Fault Prediction Models – An Experimental Investigation Using Combination Metrics

Examining the Predictive Capability of Advanced Software Fault Prediction Models – An Experimental Investigation Using Combination Metrics

[1]Pooja Sharma and Amrit Lal Sangal, "Examining the Predictive Capability of Advanced Software Fault Prediction Models – An Experimental Investigation Using Combination Metrics", In e-Informatica Software Engineering Journal, vol. 16, no. 1, pp. 220104, 2022. DOI: 10.37190/e-Inf220104.

Download article (PDF)Get article BibTeX file


Pooja Sharma, Amrit Lal Sangal


Background: Fault prediction is a key problem in software engineering domain. In recent years, an increasing interest in exploiting machine learning techniques to make informed decisions to improve software quality based on available data has been observed.

Aim: The study aims to build and examine the predictive capability of advanced fault prediction models based on product and process metrics by using machine learning classifiers and ensemble design.

Method: Authors developed a methodological framework, consisting of three phases i.e., (i) metrics identification (ii) experimentation using base ML classifiers and ensemble design (iii) evaluating performance and cost sensitiveness. The study has been conducted on 32 projects from the PROMISE, BUG, and JIRA repositories.

Result: The results shows that advanced fault prediction models built using ensemble methods show an overall median of F-score ranging between 76.50% and 87.34% and the ROC(AUC) between 77.09% and 84.05% with better predictive capability and cost sensitiveness. Also, non-parametric tests have been applied to test the statistical significance of the classifiers.

Conclusion: The proposed advanced models have performed impressively well for inter project fault prediction for projects from PROMISE, BUG, and JIRA repositories.


product and process metrics, classifiers, ensemble design, software fault prediction, software quality


1. Z. Li, X.Y. Jing, and X. Zhu, “Progress on approaches to software defect prediction,” Iet Software , Vol. 12, No. 3, 2018, pp. 161–175.

2. Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A general software defect-proneness prediction framework,” IEEE transactions on software engineering , Vol. 37, No. 3, 2010, pp. 356–370.

3. I.S. Committee et al., “Ieee std 610.12-1990 ieee standard glossary of software engineering terminology,” online] http://st-dards. ieee. org/reading/ieee/stdpublic/description/se/610.12-1990 desc. html , 1990.

4. X. Yang, D. Lo, X. Xia, and J. Sun, “TLEL: A two-layer ensemble learning approach for just-in-time defect prediction,” Information and Software Technology , Vol. 87, 2017, pp. 206–220.

5. X. Yang, D. Lo, X. Xia, Y. Zhang, and J. Sun, “Deep learning for just-in-time defect prediction,” in 2015 IEEE International Conference on Software Quality, Reliability and Security . IEEE, 2015, pp. 17–26.

6. Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu et al., “Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models,” in Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering , 2016, pp. 157–168.

7. Ö.F. Arar and K. Ayan, “Deriving thresholds of software metrics to predict faults on open source software: Replicated case studies,” Expert Systems with Applications , Vol. 61, 2016, pp. 106–121.

8. R. Malhotra and J. Jain, “Handling imbalanced data using ensemble learning in software defect prediction,” in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) . IEEE, 2020, pp. 300–304.

9. F. Matloob, T.M. Ghazal, N. Taleb, S. Aftab, M. Ahmad et al., “Software defect prediction using ensemble learning: A systematic literature review,” IEEE Access , 2021.

10. L. Pascarella, F. Palomba, and A. Bacchelli, “Fine-grained just-in-time defect prediction,” Journal of Systems and Software , Vol. 150, 2019, pp. 22–36.

11. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Transactions on Software Engineering , Vol. 34, No. 4, 2008, pp. 485–496.

12. S.S. Rathore and S. Kumar, “An empirical study of ensemble techniques for software fault prediction,” Applied Intelligence , Vol. 51, No. 6, 2021, pp. 3615–3644.

13. R. Jabangwe, J. Börstler, D. Šmite, and C. Wohlin, “Empirical evidence on the link between object-oriented measures and external quality attributes: A systematic literature review,” Empirical Software Engineering , Vol. 20, No. 3, 2015, pp. 640–693.

14. Z. Li, X.Y. Jing, and X. Zhu, “Heterogeneous fault prediction with cost-sensitive domain adaptation,” Software Testing, Verification and Reliability , Vol. 28, No. 2, 2018, p. e1658.

15. R. Malhotra, “An empirical framework for defect prediction using machine learning techniques with Android software,” Applied Soft Computing , Vol. 49, 2016, pp. 1034–1050.

16. L. Qiao, X. Li, Q. Umer, and P. Guo, “Deep learning based software defect prediction,” Neurocomputing , Vol. 385, 2020, pp. 100–110.

17. I. Kiris, S. Kapan, A. Kılbas, N. Yılmaz, I. Altuntaş et al., “The protective effect of erythropoietin on renal injury induced by abdominal aortic-ischemia-reperfusion in rats,” Journal of Surgical Research , Vol. 149, No. 2, 2008, pp. 206–213.

18. L. Madeyski and M. Jureczko, “Which process metrics can significantly improve defect prediction models? An empirical study,” Software Quality Journal , Vol. 23, No. 3, 2015, pp. 393–422.

19. D. Radjenović, M. Heričko, R. Torkar, and A. Živkovič, “Software fault prediction metrics: A systematic literature review,” Information and software technology , Vol. 55, No. 8, 2013, pp. 1397–1418.

20. Y. Wu, Y. Yang, Y. Zhao, H. Lu, Y. Zhou et al., “The influence of developer quality on software fault-proneness prediction,” in 2014 eighth international conference on software security and reliability (SERE) . IEEE, 2014, pp. 11–19.

21. C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu, “Don’t touch my code! Examining the effects of ownership on software quality,” in Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering , 2011, pp. 4–14.

22. D. Di Nucci, F. Palomba, G. De Rosa, G. Bavota, R. Oliveto et al., “A developer centered bug prediction model,” IEEE Transactions on Software Engineering , Vol. 44, No. 1, 2017, pp. 5–24.

23. F. Palomba, M. Zanoni, F.A. Fontana, A. De Lucia, and R. Oliveto, “Toward a smell-aware bug prediction model,” IEEE Transactions on Software Engineering , Vol. 45, No. 2, 2017, pp. 194–218.

24. F. Rahman and P. Devanbu, “How, and why, process metrics are better,” in 2013 35th International Conference on Software Engineering (ICSE) . IEEE, 2013, pp. 432–441.

25. B. Ghotra, S. McIntosh, and A.E. Hassan, “Revisiting the impact of classification techniques on the performance of defect prediction models,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering , Vol. 1. IEEE, 2015, pp. 789–800.

26. F. Yucalar, A. Ozcift, E. Borandag, and D. Kilinc, “Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability,” Engineering Science and Technology, an International Journal , Vol. 23, No. 4, 2020, pp. 938–950.

27. I.H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Information and Software Technology , Vol. 58, 2015, pp. 388–402.

28. C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Information Sciences , Vol. 179, No. 8, 2009, pp. 1040–1058.

29. T.M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute selection and imbalanced data: Problems in software defect prediction,” in 2010 22nd IEEE International conference on tools with artificial intelligence , Vol. 1. IEEE, 2010, pp. 137–144.

30. X. Chen, Y. Mu, Y. Qu, C. Ni, M. Liu et al., “Do different cross-project defect prediction methods identify the same defective modules?” Journal of Software: Evolution and Process , Vol. 32, No. 5, 2020, p. e2234.

31. Y. Zhang, D. Lo, X. Xia, and J. Sun, “Combined classifier for cross-project defect prediction: An extended empirical study,” Frontiers of Computer Science , Vol. 12, No. 2, 2018, p. 280.

32. T. Lee, J. Nam, D. Han, S. Kim, and H.P. In, “Micro interaction metrics for defect prediction,” in Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering , 2011, pp. 311–321.

33. K. Juneja, “A fuzzy-filtered neuro-fuzzy framework for software fault prediction for inter-version and inter-project evaluation,” Applied Soft Computing , Vol. 77, 2019, pp. 696–713.

34. H. Wang, T.M. Khoshgoftaar, and A. Napolitano, “A comparative study of ensemble feature selection techniques for software defect prediction,” in 2010 Ninth International Conference on Machine Learning and Applications . IEEE, 2010, pp. 135–140.

35. J. Petrić, D. Bowes, T. Hall, B. Christianson, and N. Baddoo, “Building an ensemble for software defect prediction based on diversity selection,” in Proceedings of the 10th ACM/IEEE International symposium on empirical software engineering and measurement , 2016, pp. 1–10.

36. F. Pecorelli and D. Di Nucci, “Adaptive selection of classifiers for bug prediction: A large-scale empirical analysis of its performances and a benchmark study,” Science of Computer Programming , Vol. 205, 2021, p. 102611.

37. D. Di Nucci, F. Palomba, R. Oliveto, and A. De Lucia, “Dynamic selection of classifiers in bug prediction: An adaptive method,” IEEE Transactions on Emerging Topics in Computational Intelligence , Vol. 1, No. 3, 2017, pp. 202–212.

38. D. Bowes, T. Hall, and J. Petrić, “Software defect prediction: do different classifiers find the same defects?” Software Quality Journal , Vol. 26, No. 2, 2018, pp. 525–552.

39. G. Abaei, A. Selamat, and H. Fujita, “An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction,” Knowledge-Based Systems , Vol. 74, 2015, pp. 28–39.

40. E. Erturk and E.A. Sezer, “A comparison of some soft computing methods for software fault prediction,” Expert systems with applications , Vol. 42, No. 4, 2015, pp. 1872–1879.

41. Y. Hu, B. Feng, X. Mo, X. Zhang, E. Ngai et al., “Cost-sensitive and ensemble-based prediction model for outsourced software project risk prediction,” Decision Support Systems , Vol. 72, 2015, pp. 11–23.

42. M.O. Elish, H. Aljamaan, and I. Ahmad, “Three empirical studies on predicting software maintainability using ensemble methods,” Soft Computing , Vol. 19, No. 9, 2015, pp. 2511–2524.

43. P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” Information and Software Technology , Vol. 59, 2015, pp. 170–190.

44. W. Rhmann, B. Pandey, G. Ansari, and D.K. Pandey, “Software fault prediction based on change metrics using hybrid algorithms: An empirical study,” Journal of King Saud University-Computer and Information Sciences , Vol. 32, No. 4, 2020, pp. 419–424.

45. A. Kaur and I. Kaur, “An empirical evaluation of classification algorithms for fault prediction in open source projects,” Journal of King Saud University-Computer and Information Sciences , Vol. 30, No. 1, 2018, pp. 2–17.

46. D. Cotroneo, A.K. Iannillo, R. Natella, R. Pietrantuono, and S. Russo, “The software aging and rejuvenation repository: Http://openscience. us/repo/software-aging,” in International Symposium on Software Reliability Engineering Workshops (ISSREW) . IEEE, 2015, pp. 108–113.

47. M. D’Ambros, M. Lanza, and R. Robbes, “An extensive comparison of bug prediction approaches,” in Proceedings of MSR 2010 (7th IEEE Working Conference on Mining Software Repositories) . IEEE CS Press, 2010, pp. 31 – 41.

48. T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters et al., “The promise repository of empirical software engineering data,” West Virginia University, Department of Computer Science , 2012.

49. M. Shepperd, Q. Song, Z. Sun, and C. Mair, “Data quality: Some comments on the nasa software defect datasets,” IEEE Transactions on Software Engineering , Vol. 39, No. 9, 2013, pp. 1208–1215.

50. N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research , Vol. 16, 2002, pp. 321–357.

51. S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Transactions on Reliability , Vol. 62, No. 2, 2013, pp. 434–443.

52. K. Gao, T.M. Khoshgoftaar, H. Wang, and N. Seliya, “Choosing software metrics for defect prediction: an investigation on feature selection techniques,” Software: Practice and Experience , Vol. 41, No. 5, 2011, pp. 579–606.

53. Z.H. Zhou and X.Y. Liu, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE Transactions on knowledge and data engineering , Vol. 18, No. 1, 2005, pp. 63–77.

54. L. Kumar, S. Misra, and S.K. Rath, “An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes,” Computer standards & interfaces , Vol. 53, 2017, pp. 1–32.

55. M. Dash and H. Liu, “Consistency-based search in feature selection,” Artificial intelligence , Vol. 151, No. 1-2, 2003, pp. 155–176.

56. S.S. Rathore and S. Kumar, “Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,” Knowledge-Based Systems , Vol. 119, 2017, pp. 232–256.

57. T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE transactions on software engineering , Vol. 33, No. 1, 2006, pp. 2–13.

58. A.E.C. Cruz and K. Ochimizu, “Towards logistic regression models for predicting fault-prone code across software projects,” in 2009 3rd international symposium on empirical software engineering and measurement . IEEE, 2009, pp. 460–463.

59. J. Li, D.M. Witten, I.M. Johnstone, and R. Tibshirani, “Normalization, testing, and false discovery rate estimation for RNA-sequencing data,” Biostatistics , Vol. 13, No. 3, 2012, pp. 523–538.

60. J. Nam, S.J. Pan, and S. Kim, “Transfer defect learning,” in 2013 35th international conference on software engineering (ICSE) . IEEE, 2013, pp. 382–391.

61. S. Matsumoto, Y. Kamei, A. Monden, K.i. Matsumoto, and M. Nakamura, “An analysis of developer metrics for fault prediction,” in Proceedings of the 6th International Conference on Predictive Models in Software Engineering , 2010, pp. 1–9.

62. Y. Jiang, B. Cukic, and Y. Ma, “Techniques for evaluating fault prediction models,” Empirical Software Engineering , Vol. 13, No. 5, 2008, pp. 561–595.

63. X. Xuan, D. Lo, X. Xia, and Y. Tian, “Evaluating defect prediction approaches using a massive set of metrics: An empirical study,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing , 2015, pp. 1644–1647.

64. S. Wagner, “A literature survey of the quality economics of defect-detection techniques,” in Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering , 2006, pp. 194–203.

65. C. Jones and O. Bonsignour, The economics of software quality . Addison-Wesley Professional, 2011.

66. N. Wilde and R. Huitt, “Maintenance support for object-oriented programs,” IEEE Transactions on Software Engineering , Vol. 18, No. 12, 1992, p. 1038.

67. T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect prediction based on classifiers ensemble,” Journal of Information & Computational Science , Vol. 8, No. 16, 2011, pp. 4241–4254.

68. K. Bańczyk, O. Kempa, T. Lasota, and B. Trawiński, “Empirical comparison of bagging ensembles created using weak learners for a regression problem,” in Asian Conference on Intelligent Information and Database Systems . Springer, 2011, pp. 312–322.

69. G. Catolino and F. Ferrucci, “An extensive evaluation of ensemble techniques for software change prediction,” Journal of Software: Evolution and Process , Vol. 31, No. 9, 2019, p. e2156.

70. L. Reyzin and R.E. Schapire, “How boosting the margin can also boost classifier complexity,” in Proceedings of the 23rd international conference on Machine learning , 2006, pp. 753–760.

71. J. Petrić, D. Bowes, T. Hall, B. Christianson, and N. Baddoo, “Building an ensemble for software defect prediction based on diversity selection,” in Proceedings of the 10th ACM/IEEE International symposium on empirical software engineering and measurement , 2016, pp. 1–10.

72. A.T. Mısırlı, A.B. Bener, and B. Turhan, “An industrial case study of classifier ensembles for locating software defects,” Software Quality Journal , Vol. 19, No. 3, 2011, pp. 515–536.

73. J. Bansiya and C.G. Davis, “A hierarchical model for object-oriented design quality assessment,” IEEE Transactions on software engineering , Vol. 28, No. 1, 2002, pp. 4–17.

74. E. Shihab, Z.M. Jiang, W.M. Ibrahim, B. Adams, and A.E. Hassan, “Understanding the impact of code and process metrics on post-release defects: a case study on the eclipse project,” in Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement , 2010, pp. 1–10.

75. R. Martin, “OO design quality metrics,” An analysis of dependencies , Vol. 12, No. 1, 1994, pp. 151–170.

©2015 e-Informatyka.pl, All rights reserved.

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.