e-Informatica Software Engineering Journal A Deep-Learning-Based Bug Priority Prediction Using RNN-LSTM Neural Networks

A Deep-Learning-Based Bug Priority Prediction Using RNN-LSTM Neural Networks

2021
[1]Hani Bani-Salameh, Mohammed Sallam and Bashar Al shboul, "A Deep-Learning-Based Bug Priority Prediction Using RNN-LSTM Neural Networks", In e-Informatica Software Engineering Journal, vol. 15, no. 1, pp. 29–45, 2021. DOI: 10.37190/e-Inf210102.

Download article (PDF)Get article BibTeX file

Authors

Hani Bani-Salameh, Mohammed Sallam, Bashar Al shboul

Abstract

Context: Predicting the priority of bug reports is an important activity in software maintenance. Bug priority refers to the order in which a bug or defect should be resolved. A huge number of bug reports are submitted every day. Manual filtering of bug reports and assigning priority to each report is a heavy process, which requires time, resources, and expertise. In many cases mistakes happen when priority is assigned manually, which prevents the developers from finishing their tasks, fixing bugs, and improve the quality.

Objective: Bugs are widespread and there is a noticeable increase in the number of bug reports that are submitted by the users and teams’ members with the presence of limited resources, which raises the fact that there is a need for a model that focuses on detecting the priority of bug reports, and allows developers to find the highest priority bug reports.

This paper presents a model that focuses on predicting and assigning a priority level (high or low) for each bug report.

Method: This model considers a set of factors (indicators) such as component name, summary, assignee, and reporter that possibly affect the priority level of a bug report. The factors are extracted as features from a dataset built using bug reports that are taken from closed-source projects stored in the JIRA bug tracking system, which are used then to train and test the framework. Also, this work presents a tool that helps developers to assign a priority level for the bug report automatically and based on the LSTM’s model prediction.

Results: Our experiments consisted of applying a 5-layer deep learning RNN-LSTM neural network and comparing the results with Support Vector Machine (SVM) and K-nearest neighbors (KNN) to predict the priority of bug reports.

The performance of the proposed RNN-LSTM model has been analyzed over the JIRA dataset with more than 2000 bug reports. The proposed model has been
found 90% accurate in comparison with KNN (74%) and SVM (87%). On average, RNN-LSTM improves the F-measure by 3% compared to SVM and 15.2% compared to KNN.

Conclusion: It concluded that LSTM predicts and assigns the priority of the bug more accurately and effectively than the other ML algorithms (KNN and SVM). LSTM significantly improves the average F-measure in comparison to the other classifiers. The study showed that LSTM reported the best performance results based on all performance measures (Accuracy = 0.908, AUC = 0.95, F-measure = 0.892).

Keywords

Assigning, Priority, Bug Tracking Systems, Bug Priority, Bug Severity, Closed-Source, Data Mining, Machine Learning (ML), Deep Learning, RNN-LSTM, SVM, KNN

References

1. H. Rocha, G. De Oliveira, H. Marques-Neto, and M.T. Valente, “NextBug: a Bugzilla extension for recommending similar bugs,” Journal of Software Engineering Research and Development , Vol. 3, No. 1, 2015, p. 3.

2. Y. Tian, D. Lo, X. Xia, and C. Sun, “Automated prediction of bug report priority using multi-factor analysis,” Empirical Software Engineering , Vol. 20, No. 5, 2015, pp. 1354–1383.

3. J. Anvik, L. Hiew, and G.C. Murphy, “Who should fix this bug?” in Proceedings of the 28th international conference on Software engineering , 2006, pp. 361–370.

4. S. Wang, W. Zhang, and Q. Wang, “FixerCache: Unsupervised caching active developers for diverse bug triage,” in Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , 2014, pp. 1–10.

5. S. Mani, A. Sankaran, and R. Aralikatte, “DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging,” arXiv:1801.01275 [cs] , Jan. 2018. [Online]. http://arxiv.org/abs/1801.01275

6. M. Ohira, Y. Kashiwa, Y. Yamatani, H. Yoshiyuki, Y. Maeda, N. Limsettho, K. Fujino, H. Hata, A. Ihara, and K. Matsumoto, “A dataset of high impact bugs: Manually-classified issue reports,” in IEEE/ACM 12th Working Conference on Mining Software Repositories . IEEE, 2015, pp. 518–521.

7. Y. Tian, D. Lo, and C. Sun, “DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis.(2013),” in 29th IEEE International Conference on Software Maintenance (ICSM) , 2013, pp. 22–28.

8. Q. Umer, H. Liu, and Y. Sultan, “Emotion based automated priority prediction for bug reports,” IEEE Access , Vol. 6, 2018, pp. 35743–35752.

9. M. Mihaylov and M. Roper, Predicting the Resolution Time and Priority of Bug Reports: A Deep Learning Approach , Ph.D. dissertation, Department of Computer and Information Sciences, University of Strathclyde, 2019. [Online]. https://local.cis.strath.ac.uk/wp/extras/msctheses/papers/strath_cis_publication_2727.pdf

10. P.A. Choudhary, “Neural network based bug priority prediction model using text classification techniques,” International Journal of Advanced Research in Computer Science , Vol. 8, No. 5, 2017.

11. L. Yu, W.T. Tsai, W. Zhao, and F. Wu, “Predicting defect priority based on neural networks,” in International Conference on Advanced Data Mining and Applications . Springer, 2010, pp. 356–367.

12. J. Kanwal and O. Maqbool, “Bug prioritization to facilitate bug report triage,” Journal of Computer Science and Technology , Vol. 27, No. 2, 2012, pp. 397–412.

13. M. Sharma, P. Bedi, K.K. Chaturvedi, and V.B. Singh, “Predicting the priority of a reported bug using machine learning techniques and cross project validation,” in 12th International Conference on Intelligent Systems Design and Applications (ISDA) . IEEE, 2012, pp. 539–545.

14. M. Alenezi and S. Banitaan, “Bug reports prioritization: Which features and classifier to use?” in 12th International Conference on Machine Learning and Applications , Vol. 2. IEEE, 2013, pp. 112–116.

15. H.M. Tran, S.T. Le, S. Van Nguyen, and P.T. Ho, “An analysis of software bug reports using machine learning techniques,” SN Computer Science , Vol. 1, No. 1, 2020, p. 4.

16. V. Lyubinets, T. Boiko, and D. Nicholas, “Automated labeling of bugs and tickets using attention-based mechanisms in recurrent neural networks,” in IEEE Second International Conference on Data Stream Mining and Processing (DSMP) . IEEE, 2018, pp. 271–275.

17. G. Fan, X. Diao, H. Yu, K. Yang, and L. Chen, “Software defect prediction via attention-based recurrent neural network,” Scientific Programming , Vol. 2019, 2019.

18. W.Y. Ramay, Q. Umer, X.C. Yin, C. Zhu, and I. Illahi, “Deep neural network-based severity prediction of bug reports,” IEEE Access , Vol. 7, 2019, pp. 46846–46857.

19. Z. Lin, F. Shu, Y. Yang, C. Hu, and Q. Wang, “An empirical study on bug assignment automation using Chinese bug data,” in 3rd International Symposium on Empirical Software Engineering and Measurement . IEEE, 2009, pp. 451–455.

20. Martix . [Online]. https://www.martix.me/ (Last accessed May 17, 2020).

21. Hashfood . [Online]. https://itunes.apple.com/us/app/hashfood/id1117103333?l=ar\&ls=1\&mt=8 (Last accessed May 17, 2020).

22. Tazej . [Online]. https://itunes.apple.com/jo/app/%D8%B7%D8%A%D8%B2%D8%AC/id1150041871?mt=8 (Last accessed May 17, 2020).

23. Workspaces . [Online]. https://itunes.apple.com/us/app/theworkspacesid1246555146?l=ar&ls=1&mt=8 (Last accessed May 17, 2020).

24. Maharah . [Online]. https://play.google.com/store/apps/details?id=com.mharah.app&hl=ar (Last accessed May 17, 2020).

25. INTIX DWC Company . [Online]. http://intix.net/ (Last accessed May 17, 2020).

26. S.N. Ahsan, J. Ferzund, and F. Wotawa, “Program file bug fix effort estimation using machine learning methods for open source software Projects,” Institute for Software Technologist Technical , 2009.

27. L. Marks, Y. Zou, and A.E. Hassan, “Studying the fix-time for bugs in large open source projects,” in Proceedings of the 7th International Conference on Predictive Models in Software Engineering , 2011, pp. 1–8.

28. P. Kaur and C. Singh, “A systematic approach for bug severity classification using machine learning’s text mining techniques,” Journal of Computer Science and Information Technology , Vol. 5, No. 7, 2016.

29. Bugzilla . [Online]. https://www.bugzilla.org/ (Last accessed May 17, 2020).

30. M. Günel, Keras: Deep Learning library for Theano and TensorFlow . [Online]. https://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/tutorials/keras.pdf (Last accessed May 17, 2020).

31. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit , 1st ed. Beijing; Cambridge Mass.: O’Reilly Media, Jul. 2009.

32. Pycharm, The Python IDE for Professionals . [Online]. https://www.jetbrains.com/pycharm/ (Last accessed May 17, 2020).

33. Z. Imran, Predicting bug severity in open-source software systems using scalable machine learning techniques , mathesis, Youngstown State University, 2016.

34. I. Mani and I. Zhang, “kNN approach to unbalanced data distributions: A case study involving information extraction,” in Proceedings of Workshop on Learning from Imbalanced Datasets , Vol. 126, 2003.

35. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation , Vol. 9, No. 8, 1997, pp. 1735–1780.

36. M.N. Karim and S.L. Rivera, “Comparison of feed-forward and recurrent neural networks for bioprocess state estimation,” Computers and Chemical Engineering , Vol. 16, 1992, pp. S369–S377.

37. R. Santos, M. Rupp, S. Bonzi, and A.M. Fileti, “Comparison between multilayer feedforward neural networks and a radial basis function network to detect and locate leaks in pipelines transporting gas,” Chemical Engineering Transactions , Vol. 32, 2013, pp. 1375–1380.

38. L. Mei, R. Hu, H. Cao, Y. Liu, Z. Han, F. Li, and J. Li, “Realtime mobile bandwidth prediction using lstm neural network,” in International Conference on Passive and Active Network Measurement . Springer, 2019, pp. 34–47.

39. S. Ray, “Suppor vector machine algorithm in machine learning,” Sep. 2017. [Online]. https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/

40. W. Harrad, “A top machine learning algorithm explained: Support vector machines (svms),” Feb. 2020. [Online]. https://www.vebuso.com/2020/02/a-top-machine-learning-algorithm-explained-support-vector-machines-svms/

41. M. Waseem, “Support vector machine in python,” Nov. 2019. [Online]. https://www.edureka.co/blog/support-vector-machine-in-python/

42. sklearn.metrics.accuracy_score . [Online]. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html (Last accessed April 30, 2020).

43. U. Malik, Implementing SVM and Kernel SVM with Python’s Scikit-Learn . [Online]. https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/ (Last accessed April 30, 2020).

44. O. Harrison, Machine Learning Basics with the K-Nearest Neighbors Algorithm , 2018. [Online]. https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761 (Last accessed April 30, 2020).

45. KNN Algorithm – Finding Nearest Neighbors . [Online]. https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_knn_algorithm_finding_nearest_neighbors.htm (Last accessed April 30, 2020).

46. Google Brain Team, tensorflow. Develop and train ML models , 2015. [Online]. https://www.tensorflow.org/ (Last accessed December 15, 2019).

47. T. Mester, Pandas Basics (Reading Data Files, DataFrames, Data Selection) , 2019. [Online]. https://data36.com/pandas-tutorial-1-basics-reading-data-files-dataframes-data-selection/ (Last accessed May 17, 2020).

48. D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.

49. train_test_split . [Online]. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html (Last accessed May 17, 2020).

50. J. Brownlee, How to Diagnose Overfitting and Underfitting of LSTM Models , 2017. [Online]. https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/ (Last accessed May 17, 2020).

51. PyQt5 Reference Guide . [Online]. https://www.riverbankcomputing.com/static/Docs/PyQt5/ (Last accessed September 05, 2020).

52. G. Yang, S. Baek, J.W. Lee, and B. Lee, “Analyzing emotion words to predict severity of software bugs: A case study of open source projects,” in Proceedings of the Symposium on Applied Computing , 2017, pp. 1280–1287.

53. T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Computational Intelligence Magazine , Vol. 13, No. 3, 2018, pp. 55–75.

©2015 e-Informatyka.pl, All rights reserved.

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.