2025 | |
[1] | "A Novel Approach for Enhancing Code Smell Detection Using Random Convolutional Kernel Transform", In e-Informatica Software Engineering Journal, vol. 19, no. 1, pp. 250106, 2025.
DOI: , 10.37190/e-Inf250106. Download article (PDF)Get article BibTeX file |
Authors
Mostefai Abdelkader, Mekour Mansour
Abstract
Context: In software engineering, the presence of code smells is closely associated with increased maintenance costs and complexities, making their detection and remediation an important concern.
Objective: Despite numerous deep learning approaches for code smell detection, many still heavily rely on feature engineering processes (metrics) and exhibit limited performance. To address these shortcomings, this paper introduces CSDXR, a novel approach for enhancing code smell detection based on Random Convolutional Kernel Transform – a state-of-the-art technique for time series classification. The proposed approach does not rely on a manual feature engineering process and follows a three-step process: first, it converts code snippets into numerical sequences through tokenization; second, it applies Random Convolutional Kernel Transform to generate pooled models from these sequences; and third, it constructs a classifier from the pooled models to identify code smells.
Method: The proposed approach was evaluated on four real-world datasets and compared against four state-of-the-art methods – DeepSmells, AE-Dense, AE-CNN, and AE-LSTM – in detecting Complex Method, Multifaceted Abstraction, Feature Envy, and Complex Conditional smells.
Results: Empirical results demonstrate that CSDXR outperformed the four state-of-the-art methods – DeepSmells, AE-Dense, AE-CNN, and AE-LSTM – in detecting Complex Method and Multifaceted Abstraction smells. Specifically, the enhancement rates in terms of F1 score were 1.99% and 6.09% for Complex Method and Multifaceted Abstraction smells, respectively. In terms of MCC, the improvement rates were 0.82% and 35.64% for these two smells, respectively. The results also show that while DeepSmells achieves superior overall performance on Feature Envy and Complex Conditional smells, CSDXR surpasses AE-Dense, AE-CNN, and AE-LSTM in detecting these two types of smells.
Conclusions: The paper concludes that the proposed approach, CSDXR, demonstrates significant potential for effectively detecting various types of code smells.
Keywords
Code smell, Maintenance ,Classification, Random convolutional transform, Deep learning
References
1. P. Avgeriou et al., “Managing technical debt in software engineering(dagstuhl seminar 16162),” Dagstuhl Reports, Vol. 6, No. 4, 2016.
2. P. Kruchten, R.L. Nord, and I. Ozkaya, “Technical debt: From metaphorto theory and practice,” IEEE Software, Vol. 29, No. 6, 2012, pp. 18–21.
3. A. Alazba, H. Aljamaan, and M. Alshayeb, “Deep learning approachesfor bad smell detection: A systematic literature review,” Empirical SoftwareEngineering, Vol. 28, No. 77, 2023.
4. T. Sharma and D. Spinellis, “A survey on software smells,” Journal ofSystems and Software, Vol. 138, 2018, pp. 158–173.
5. N. Sae-Lim, S. Hayashi, and M. Saeki, “How do developers select andprioritize code smells? a preliminary study,” in Proc. IEEE 33rd Int. Conf.Software Maintenance and Evolution (ICSME), 2017, pp. 484–488.
6. M. Hozano, A. Garcia, B. Fonseca, and E. Costa, “Are you smelling it?investigating how similar developers detect code smells,” Inf. Softw. Technol.,Vol. 93, 2018, pp. 130–146.
7. T. Sharma, V. Efstathiou, P. Louridas, and D. Spinellis, “Code smelldetection by deep direct-learning and transfer-learning,” Journal of Systemsand Software, Vol. 176, 2021, p. 110936.
8. M.I. Azeem, F. Palomba, L. Shi, and Q. Wang, “Machine learningtechniques for code smell detection: A systematic literature review andmeta-analysis,” Inf. Softw. Technol., Vol. 108, 2019, pp. 115–138.
9. A. Ho, A.M. Bui, P.T. Nguyen, and A.D. Salle, “Fusion of deepconvolutional and lstm recurrent neural networks for automated detection ofcode smells,” in Proc. 27th Int. Conf. Evaluation and Assessment in SoftwareEngineering, 2023, pp. 229–234.
10. W. Xu and X. Zhang, “Multi-granularity code smell detection using deep
learning method based on abstract syntax tree,” in Proc. Int. Conf. SoftwareEngineering and Knowledge Engineering, 2021.
11. A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The greattime series classification bake off: A review and experimental evaluationof recent algorithmic advances,” Data Mining and Knowledge Discovery,Vol. 31, No. 3, 2017, pp. 606–660.
12. M. Middlehurst, P. Schäfer, and A. Bagnall, “Bake off redux: A reviewand experimental evaluation of recent time series classification algorithms,”arXiv Preprint, 2023, arXiv:2304.13029.
13. M. Middlehurst, P. Schäfer, and A. Bagnall, “Bake off redux: A reviewand experimental evaluation of recent time series classification algorithms,”Data Mining and Knowledge Discovery, 2024, pp. 1–74.
14. J. Zhang, F.Y. Wang, K. Wang, W.H. Lin, X. Xu et al., “Data-drivenintelligent transportation systems: A survey,” IEEE Trans. IntelligentTransportation Systems, Vol. 12, No. 4, 2011, pp. 1624–1639.
15. Y. Zhou, Z. Ding, Q. Wen, and Y. Wang, “Robust load forecastingtowards adversarial attacks via bayesian learning,” IEEE Trans. PowerSystems, Vol. 38, No. 2, 2023, pp. 1445–1459.
16. A.A. Cook, G. Mısırlı, and Z. Fan, “Anomaly detection for iottime-series data: A survey,” IEEE Internet of Things Journal, Vol. 7, No. 7,2019, pp. 6481–6494.
17. M. Abdelkader, “A novel method for code clone detection based onminimally random kernel convolutional transform,” IEEE Access, Vol. 12,2024, pp. 158 579–158 596.
18. M. Fowler and K. Beck, Refactoring: Improving the Design of ExistingCode. Addison-Wesley Professional, 1999.
19. G. Suryanarayana, G. Samarthyam, and T. Sharma, Refactoring forSoftware Design Smells: Managing Technical Debt. Morgan Kaufmann, 2014.
20. A. Dempster, F. Petitjean, and G.I. Webb, “Rocket: Exceptionally fastand accurate time series classification using random convolutional kernels,”Data Mining and Knowledge Discovery, Vol. 34, No. 5, 2020.
21. A. Dempster, D.F. Schmidt, and G.I. Webb, “Minirocket: A very fast(almost) deterministic transform for time series classification,” in Proc.ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2021,pp. 248–257.
22. B. Nyirongo, Y. Jiang, H. Jiang, and H. Liu, “A survey of deep learningbased software refactoring,” arXiv preprint arXiv:2404.19226, 2024.
23. T. Sharma and M. Kessentini, “Qscored: A large dataset of code smellsand quality metrics,” in Proc. IEEE/ACM 18th Int. Conf. Mining SoftwareRepositories (MSR), 2021, pp. 590–594.
24. Y. Zhang, C. Ge, H. Liu, and K. Zheng, “Code smell detection basedon supervised learning models: A survey,” Neurocomputing, Vol. 565, 2024,p. 127014.
25. R.S. Pressman, Software Engineering: A Practitioner’s Approach, 6th ed.Palgrave Macmillan, 2005.
26. N.E. Fenton, Software Metrics—A Rigorous Approach. London:Chapman & Hall, 1991.
27. R. Marinescu, “Detection strategies: Metrics-based rules for detectingdesign flaws,” in Proc. 20th Int. Conf. Software Maintenance (ICSM 2004),Chicago, IL, USA, 2004, pp. 350–359.
28. R. Marinescu, “Measurement and quality in object-oriented design,” inProc. 21st IEEE Int. Conf. Software Maintenance (ICSM 2005), Budapest,Hungary, 2005, pp. 701–704.
29. I.M. Bertran, A. Garcia, and A. von Staa, “Defining and applyingdetection strategies for aspect-oriented code smells,” in Proc. 24th SBES2010, Salvador, Bahia, Brazil, 2010, pp. 60–69.
30. A.M. Fard and A. Mesbah, “Jsnose: Detecting javascript code smells,” inProc. 13th IEEE Int. Working Conf. Source Code Analysis and Manipulation(SCAM 2013), Eindhoven, Netherlands, 2013, pp. 116–125.
31. Z. Chen, L. Chen, W. Ma, X. Zhou, Y. Zhou et al., “Understanding
metric-based detectable smells in python software: A comparative study,” Inf.Softw. Technol., Vol. 94, 2018, pp. 14–29.
32. N. Moha, Y.G. Guéhéneuc, L. Duchien, and A.F.L. Meur, “Decor: Amethod for the specification and detection of code and design smells,” IEEETrans. Software Eng., Vol. 36, No. 1, 2010, pp. 20–36.
33. G. Suryanarayana, G. Samarthyam, and T. Sharma, Refactoring forSoftware Design Smells: Managing Technical Debt. Morgan Kaufmann, 2014.
34. F.A. Fontana, M.V. Mäntylä, M. Zanoni, and A. Marino, “Comparingand experimenting with machine learning techniques forcode smell detection,” Empirical Software Engineering, Vol. 21, No. 3, 2016,pp. 1143–1191.
35. A. Maiga, N. Ali, N. Bhattacharya, A. Sabané, Y.G. Guéhéneuc et al.,“Support vector machines for anti-pattern detection,” in Proc. 27thIEEE/ACM Int. Conf. Automated Software Engineering (ASE 2012), 2012,pp. 278–281.
36. F. Khomh, S. Vaucher, Y.G. Guéhéneuc, and H. Sahraoui, “A bayesianapproach for the detection of code and design smells,” in Proc. 9th Int. Conf.Quality Software (QSIC 2009), 2009, pp. 305–314.
37. J. Kreimer, “Adaptive detection of design flaws,” Electronic Notes inTheoretical Computer Science, Vol. 141, No. 4, 2005, pp. 117–136.
38. M. Škipina, J. Slivka, N. Luburić, and A. Kovačević, “Automaticdetection of code smells using metrics and codet5 embeddings: A case studyin c,” Neural Computing and Applications, 2024, pp. 1–18.
39. M. Hadj-Kacem and N. Bouassida, “A hybrid approach to detect codesmells using deep learning,” in Proc. Int. Conf. Evaluation of NovelApproaches to Software Engineering, 2018.
40. H. Liu, Z. Xu, and Y. Zou, “Deep learning-based feature envydetection,” in Proc. 33rd IEEE/ACM Int. Conf. Automated SoftwareEngineering (ASE 2018), 2018, pp. 385–396.
41. B. Liu, H. Liu, G. Li, N. Niu, Z. Xu et al., “Deep learning-basedfeature envy detection boosted by real-world examples,” in Proc. 31st
ACM Joint European Software Engineering Conf. and Symp. Foundations ofSoftware Engineering (ESEC/FSE 2023), 2023, pp. 908–920.
42. A.K. Das, S. Yadav, and S. Dhal, “Detecting code smells using deeplearning,” in Proc. TENCON 2019 – IEEE Region 10 Conf., 2019,pp. 2081–2086.
43. D. Yu, Y. Xu, L. Weng, J. Chen, X. Chen et al., “Detecting andrefactoring feature envy based on graph neural network,” in Proc. IEEE33rd Int. Symp. Software Reliability Engineering (ISSRE 2022), 2022,pp. 458–469.
44. H. Zhang and T. Kishi, “Long method detection using graphconvolutional networks,” J. Inf. Process., Vol. 31, Aug. 2023, pp. 469–477.
45. Y. Zhang, C. Ge, S. Hong, R. Tian, C.R. Dong et al., “Delesmell: Codesmell detection based on deep learning and latent semantic analysis,” Knowl.Based Syst., Vol. 255, 2022, p. 109737.
46. Y. Zhang and C. Dong, “Mars: Detecting brain class/method code smellbased on metric–attention mechanism and residual network,” J. Softw.: Evol.Process, 2021.
47. Y. Li and X. Zhang, “Multi-label code smell detection with hybrid modelbased on deep learning,” in Proc. Int. Conf. Software Engineering andKnowledge Engineering, 2022.
48. H. Liu, J. Jin, Z. Xu, Y. Zou, Y. Bu et al., “Deep learning basedcode smell detection,” IEEE Trans. Software Eng., Vol. 47, No. 9, 2019,pp. 1811–1837.
49. K. Alkharabsheh, S. Alawadi, V.R. Kebande, Y. Crespo, M.F. Delgadoet al., “A comparison of machine learning algorithms on design smelldetection using balanced and imbalanced dataset: A study of god class,” Inf.Softw. Technol., Vol. 143, 2022, p. 106736.
50. P. Probst, B. Bischl, and A.L. Boulesteix, “Tunability: Importanceof hyperparameters of machine learning algorithms,” arXiv preprintarXiv:1802.09596, 2018.
51. L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar,
“Hyperband: A novelbandit-based approach to hyperparameter optimization,” in Proc. Int. Conf.Learning Representations, 2018, pp. 1–48.
52. L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur et al.,“A system for massively parallel hyperparameter tuning,” in Proc. MachineLearning and Systems, Vol. 2, 2020, pp. 230–246.
53. S.H. Walker and D.B. Duncan, “Estimation of the probability of an eventas a function of several independent variables,” Biometrika, Vol. 54, No. 1–2,1967, pp. 167–179.
54. T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining,2016, pp. 785–794.
55. T.K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. DocumentAnalysis and Recognition, 1995, pp. 278–282.
56. J.G.H. John and P. Langley, “Estimating continuous distributions inbayesian classifiers,” arXiv preprint arXiv:1302.4964, 2013.
57. F. Yang, “An extended idea about decision trees,” in 2019 Int.Conf. Computational Science and Computational Intelligence (CSCI), 2019,pp. 349–354.
58. Student, “The probable error of a mean,” Biometrika, Vol. 6, 1908,pp. 1–25.
59. L.V. Hedges, “Estimation of effect size from a series of independentexperiments,” Psychological Bulletin, Vol. 92, No. 2, 1982, pp. 490–499.
60. J. Cohen, Statistical power analysis for the behavioral sciences, 2nd ed.Routledge, 1988.
61. P.D. Ellis, The essential guide to effect sizes: Statistical power,meta-analysis, and the interpretation of research results. CambridgeUniversity Press, 2010.
62. J.M. Hoenig and D.M. Heisey, “The abuse of power: The pervasive
fallacy of power calculations for data analysis,” The American Statistician,Vol. 55, No. 1, 2001, pp. 1–6.
63. F.J. Massey, “The kolmogorov-smirnov test for goodness of fit,” Am. Stat.Assoc., Vol. 46, No. 253, 1951, pp. 68–78.