|||"Applying Machine Learning to Software Fault Prediction", In e-Informatica Software Engineering Journal, vol. 12, no. 1, pp. 199–216, 2018.
DOI: , 10.5277/e-Inf180108.|
Get article (PDF)View article entry (BibTeX)
Bartłomiej Wójcicki, Robert Dąbrowski
Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets.
classifier, fault prediction, machine learning, metric, Naïve Bayes, Python, quality, software intelligence
 R. Kaufmann and D. Janzen, “Implications of test-driven development: A pilot study,” in Companion of the 18th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, ser. OOPSLA ’03. New York, NY, USA: ACM, 2003, pp. 298–299.
 R. Dąbrowski, “On architecture warehouses and software intelligence,” in Future Generation Information Technology, ser. Lecture Notes in Computer Science, T.H. Kim, Y.H. Lee, and W.C. Fang, Eds., Vol. 7709. Springer, 2012, pp. 251–262.
 R. Dąbrowski, K. Stencel, and G. Timoszuk, “Software is a directed multigraph,” in 5th European Conference on Software Architecture ECSA, ser. Lecture Notes in Computer Science, I. Crnkovic, V. Gruhn, and M. Book, Eds., Vol. 6903. Essen, Germany: Springer, 2011, pp. 360–369.
 G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, Tech. Rep., 2002. [Online]. https://pdfs.semanticscholar.org/9b68/5f84da00514397d9af7f27cc0b7db7df05c3.pdf
 T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A.B. Bener, “Defect prediction from static code features: Current results, limitations, new approaches,” Automated Software Engineering, Vol. 17, No. 4, 2010, pp. 375–407.
 S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Transactions on Software Engineering, Vol. 34, No. 4, 2008, pp. 485–496.
 T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Transactions on Software Engineering, Vol. 38, No. 6, 2012, pp. 1276–1304.
 B.W. Boehm, “Software risk management,” in ESEC ’89, 2nd European Software Engineering Conference, ser. Lecture Notes in Computer Science, C. Ghezzi and J.A. McDermid, Eds., Vol. 387. Springer, 1989, pp. 1–19.
 T.M. Khoshgoftaar and N. Seliya, “Fault prediction modeling for software quality estimation: Comparing commonly used techniques,” Empirical Software Engineering, Vol. 8, No. 3, 2003, pp. 255–283.
 T. Menzies, R. Krishna, and D. Pryor, The Promise Repository of Empirical Software Engineering Data, North Carolina State University, Department of Computer Science, (2015). [Online]. http://openscience.us/repo
 T. Menzies, J.S.D. Stefano, K. Ammar, K. McGill, P. Callis, R.M. Chapman, and J. Davis, “When can we test less?” in 9th IEEE International Software Metrics Symposium (METRICS), Sydney, Australia, 2003, p. 98.
 T. Menzies, J.S.D. Stefano, and M. Chapman, “Learning early lifecycle IV&V quality indicators,” in 9th IEEE International Software Metrics Symposium (METRICS), Sydney, Australia, 2003, pp. 88–97.
 T. Menzies and J.S.D. Stefano, “How good is your blind spot sampling policy?” in 8th IEEE International Symposium on High-Assurance Systems Engineering (HASE). Tampa, FL, USA: IEEE Computer Society, 2004, pp. 129–138.
 F. Zhang, A. Mockus, I. Keivanloo, and Y. Zou, “Towards building a universal defect prediction model,” in 11th Working Conference on Mining Software Repositories, MSR, P.T. Devanbu, S. Kim, and M. Pinzger, Eds. Hyderabad, India: ACM, 2014, pp. 182–191.
 S. Kim, T. Zimmermann, K. Pan, and E.J. Whitehead, Jr., “Automatic identification of bug-introducing changes,” in 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). Tokyo, Japan: IEEE Computer Society, 2006, pp. 81–90.
 T. Gyimóthy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” IEEE Transactions on Software Engineering, Vol. 31, No. 10, 2005, pp. 897–910.
 T. Menzies, A. Dekhtyar, J.S.D. Stefano, and J. Greenwald, “Problems with precision: A response to ‘Comments on data mining static code attributes to learn defect predictors’,” IEEE Transactions on Software Engineering, Vol. 33, No. 9, 2007, pp. 637–640.
 F. Shull, V.R. Basili, B.W. Boehm, A.W. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, and M.V. Zelkowitz, “What we have learned about fighting defects,” in 8th IEEE International Software Metrics Symposium (METRICS). Ottawa, Canada: IEEE Computer Society, 2002, p. 249.
 S. Kim, T. Zimmermann, E.J. Whitehead, Jr., and A. Zeller, “Predicting faults from cached history,” in 29th International Conference on Software Engineering (ICSE 2007). Minneapolis, MN, USA: IEEE, 2007, pp. 489–498.
 F. Rahman, D. Posnett, A. Hindle, E.T. Barr, and P.T. Devanbu, “BugCache for inspections: hit or miss?” in SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13rd European Software Engineering Conference (ESEC-13), T. Gyimóthy and A. Zeller, Eds. Szeged, Hungary: ACM, 2011, pp. 322–331.
 C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E.J.W. Jr., “Does bug prediction support human developers? Findings from a Google case study,” in 35th International Conference on Software Engineering, ICSE, D. Notkin, B.H.C. Cheng, and K. Pohl, Eds. San Francisco, CA, USA: IEEE / ACM, 2013, pp. 372–381.
 N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” in 27th International Conference on Software Engineering (ICSE), G. Roman, W.G. Griswold, and B. Nuseibeh, Eds. ACM, 2005, pp. 284–292.
 E. Giger, M. Pinzger, and H.C. Gall, “Comparing fine-grained source code changes and code churn for bug prediction,” in Proceedings of the 8th International Working Conference on Mining Software Repositories, MSR, A. van Deursen, T. Xie, and T. Zimmermann, Eds. ACM, 2011, pp. 83–92.
 S.R. Chidamber and C.F. Kemerer, “Towards a metrics suite for object oriented design,” in Sixth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’91), A. Paepcke, Ed. Phoenix, Arizona, USA: ACM, 1991, pp. 197–211.
 R. Wu, H. Zhang, S. Kim, and S. Cheung, “ReLink: recovering links between bugs and changes,” in SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13rd European Software Engineering Conference (ESEC-13), T. Gyimóthy and A. Zeller, Eds. Szeged, Hungary: ACM, 2011, pp. 15–25.