e-Informatica Software Engineering Journal Index-Based Type-3 Clone Detection

Index-Based Type-3 Clone Detection

2026
[1]Zdenek Tronicek, "Index-Based Type-3 Clone Detection", In e-Informatica Software Engineering Journal, vol. 20, no. 1, pp. 260105, 2026. DOI: 10.37190/e-Inf260105.

Download article (PDF)Get article BibTeX file

Authors

Zdenek Tronicek

Abstract

Context: Clone detection is a common task in software engineering. Type-3 clones are fragments of code that can be slightly different in structure.
Objective: The article presents a new algorithm for Type-3 clone detection, its open-source implementation called DrDupLex3, and novel open-source tools that can be used for the automated assessment of Type-3 clones and to prepare training sets for machine learning-based clone detectors.
Method: The algorithm for Type-3 clone detection builds upon the index of source code used in DrDupLex, the most accurate Type-2 clone detector to date.
Results: A comparison with three state-of-the-art clone detectors (NiCad, CloneWorks, and SourcererCC) shows that DrDupLex3 is able to outperform them in precision, recall, and running time. It reported no false positives and found all clones reported by NiCad, CloneWorks, and SourcererCC.
Conclusions: The presented clone detector outperforms three state-of-the-art competitors in a scenario that can be easily repeated because it is based on tools for automated assessment of Type-3 clones.

Keywords

clone detection, code clones, near-miss clones

References

1. C.K. Roy and J.R. Cordy, “A survey on software clone detection research,” Queen’s School of Computing TR , Vol. 541, No. 115, 2007, pp. 64–68.

2. S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, “Comparison and evaluation of clone detection tools,” IEEE Transactions on Software Engineering , Vol. 33, No. 9, 2007, pp. 577–591.

3. C.K. Roy, J.R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Science of Computer Programming , Vol. 74, No. 7, 2009, pp. 470–495.

4. D. Rattan, R. Bhatia, and M. Singh, “Software clone detection: A systematic review,” Information and Software Technology , Vol. 55, No. 7, 2013, pp. 1165–1199.

5. A. Sheneamer and J. Kalita, “A survey of software clone detection techniques,” International Journal of Computer Applications , Vol. 137, No. 10, 2016, pp. 1–21.

6. Q.U. Ain, W.H. Butt, M.W. Anwar, F. Azam, and B. Maqbool, “A systematic review on code clone detection,” IEEE Access , Vol. 7, 2019, pp. 86 121–86 144.

7. G. Shobha, A. Rana, V. Kansal, and S. Tanwar, “Code clone detection—a systematic review,” Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2 , 2021, pp. 645–655.

8. B.S. Baker, “Parameterized duplication in strings: Algorithms and an application to software maintenance,” SIAM Journal on Computing , Vol. 26, No. 5, 1997, pp. 1343–1362.

9. J.R. Cordy and C.K. Roy, “The NiCad clone detector,” in 2011 IEEE 19th International Conference on Program Comprehension . IEEE, 2011, pp. 219–220.

10. T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE Transactions on Software Engineering , Vol. 28, No. 7, 2002, pp. 654–670.

11. Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: Finding copy-paste and related bugs in large-scale software code,” IEEE Transactions on Software Engineering , Vol. 32, No. 3, 2006, pp. 176–192.

12. I.D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier, “Clone detection using abstract syntax trees,” in Proceedings of the International Conference on Software Maintenance , 1998, pp. 368–377.

13. J. Krinke, “Identifying similar code with program dependence graphs,” in Proceedings Eighth Working Conference on Reverse Engineering , 2001, pp. 301–309.

14. R. Komondoor and S. Horwitz, “Using slicing to identify duplication in source code,” in International Static Analysis Symposium , 2001, pp. 40–56.

15. C. Liu, C. Chen, J. Han, and P.S. Yu, “Gplag: detection of software plagiarism by program dependence graph analysis,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , 2006, pp. 872–881.

16. J. Mayrand, C. Leblanc, and E. Merlo, “Experiment on the automatic detection of function clones in a software system using metrics,” in Proceedings of International Conference on Software Maintenance , Vol. 96, 1996, p. 244.

17. E. Kodhai and S. Kanmani, “Method-level code clone detection through LWH (Light Weight Hybrid) approach,” Journal of Software Engineering Research and Development , Vol. 2, 2014, pp. 1–29.

18. M. Kaur and D. Rattan, “A systematic literature review on the use of machine learning in code clone research,” Computer Science Review , Vol. 47, 2023, p. 100528.

19. W. Wang and M.W. Godfrey, “Recommending clones for refactoring using design, context, and history,” in 2014 IEEE International Conference on Software Maintenance and Evolution . IEEE, 2014, pp. 331–340.

20. N. Tsantalis, D. Mazinanian, and G.P. Krishnan, “Assessing the refactorability of software clones,” IEEE Transactions on Software Engineering , Vol. 41, No. 11, 2015, pp. 1055–1090.

21. E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, “Do code clones matter?” in 2009 IEEE 31st International Conference on Software Engineering , 2009, pp. 485–495.

22. J. Jang, A. Agrawal, and D. Brumley, “ReDeBug: Finding unpatched code clones in entire OS distributions,” in 2012 IEEE Symposium on Security and Privacy . IEEE, 2012, pp. 48–62.

23. H. Li, H. Kwon, J. Kwon, and H. Lee, “Clorifi: software vulnerability discovery using code clone verification,” Concurrency and Computation: Practice and Experience , Vol. 28, No. 6, 2016, pp. 1900–1917.

24. S. Kim, S. Woo, H. Lee, and H. Oh, “Vuddy: A scalable approach for vulnerable code clone discovery,” in 2017 IEEE symposium on security and privacy (SP) . IEEE, 2017, pp. 595–614.

25. B. Bowman and H.H. Huang, “VGRAPH: a robust vulnerable code clone detection system using code property triplets,” in 2020 IEEE European Symposium on Security and Privacy (EuroS&P) . IEEE, 2020, pp. 53–69.

26. H. Zhang and K. Sakurai, “A survey of software clone detection from security perspective,” IEEE Access , Vol. 9, 2021, pp. 48 157–48 173.

27. N. He, L. Wu, H. Wang, Y. Guo, and X. Jiang, “Characterizing code clones in the Ethereum smart contract ecosystem,” in Financial Cryptography and Data Security: 24th International Conference, FC 2020, Kota Kinabalu, Malaysia, February 10–14, 2020 Revised Selected Papers 24 . Springer, 2020, pp. 654–675.

28. Q. Hum, W.J. Tan, S.Y. Tey, L. Lenus, I. Homoliak et al., “Coinwatch: A clone-based approach for detecting vulnerabilities in cryptocurrencies,” in 2020 IEEE International Conference on Blockchain (Blockchain) . IEEE, 2020, pp. 17–25.

29. J. Crussell, C. Gibler, and H. Chen, “Attack of the clones: Detecting cloned applications on Android markets,” in Computer Security–ESORICS 2012: 17th European Symposium on Research in Computer Security, Pisa, Italy, September 10-12, 2012. Proceedings 17 . Springer, 2012, pp. 37–54.

30. K. Chen, P. Liu, and Y. Zhang, “Achieving accuracy and scalability simultaneously in detecting application clones on Android markets,” in Proceedings of the 36th International Conference on Software Engineering , 2014, pp. 175–186.

31. J. Akram, Z. Shi, M. Mumtaz, and P. Luo, “Droidcc: A scalable clone detection approach for Android applications to detect similarity at source code level,” in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) , Vol. 1. IEEE, 2018, pp. 100–105.

32. M.R. Farhadi, B.C. Fung, P. Charland, and M. Debbabi, “Binclone: Detecting code clones in malware,” in 2014 Eighth International Conference on Software Security and Reliability (SERE) . IEEE, 2014, pp. 78–87.

33. I. Keivanloo, F. Zhang, and Y. Zou, “Threshold-free code clone detection for a large-scale heterogeneous Java repository,” in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) . IEEE, 2015, pp. 201–210.

34. Z. Tronicek, “Indexing source code and clone detection,” Information and Software Technology , Vol. 144, 2022, p. 106805.

35. R.A. Wagner and M.J. Fischer, “The string-to-string correction problem,” Journal of the ACM (JACM) , Vol. 21, No. 1, 1974, pp. 168–173.

36. P. Wang, J. Svajlenko, Y. Wu, Y. Xu, and C.K. Roy, “CCAligner: a token based large-gap clone detector,” in Proceedings of the 40th International Conference on Software Engineering , 2018, pp. 1066–1077.

37. J. Svajlenko and C.K. Roy, “Fast and flexible large-scale clone detection with CloneWorks,” in 2017 IEEE/ACM 39th International Conference on Software Engineering (Companion Volume) , 2017, pp. 27–30.

38. Y. Hu, Y. Fang, Y. Sun, Y. Jia, Y. Wu et al., “Code2img: Tree-based image transformation for scalable code clone detection,” IEEE Transactions on Software Engineering , 2023.

39. L. Jiang, G. Misherghi, Z. Su, and S. Glondu, “Deckard: Scalable and accurate tree-based detection of code clones,” in 29th International Conference on Software Engineering , 2007, pp. 96–105.

40. C.K. Roy and J.R. Cordy, “NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization,” in 16th IEEE International Conference on Program Comprehension , 2008, pp. 172–181.

41. T. Nakagawa, Y. Higo, and S. Kusumoto, “NIL: large-scale detection of large-variance clones,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2021, pp. 830–841.

42. C. Ragkhitwetsagul and J. Krinke, “Siamese: scalable and incremental code clone search via multiple code representations,” Empirical Software Engineering , Vol. 24, No. 4, 2019, pp. 2236–2284.

43. H. Sajnani, V. Saini, J. Svajlenko, C.K. Roy, and C.V. Lopes, “SourcererCC: Scaling code clone detection to big-code,” in Proceedings of the 38th International Conference on Software Engineering , 2016, pp. 1157–1168.

44. W. Amme, T.S. Heinze, and A. Schäfer, “You look so different: Finding structural clones and subclones in Java source code,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 2021, pp. 70–80.

45. Y. Wang, Y. Ye, Y. Wu, W. Zhang, Y. Xue et al., “Comparison and evaluation of clone detection techniques with different code representations,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . IEEE, 2023, pp. 332–344.

46. T. Hu, Z. Xu, Y. Fang, Y. Wu, B. Yuan et al., “Fine-grained code clone detection with block-based splitting of abstract syntax tree,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis , 2023, pp. 89–100.

47. J. Svajlenko and C.K. Roy, “Evaluating clone detection tools with BigCloneBench,” in 2015 IEEE international conference on software maintenance and evolution (ICSME) . IEEE, 2015, pp. 131–140.

48. N.A. Kraft, B.W. Bonds, and R.K. Smith, “Cross-language clone detection,” in 20th International Conference on Software Engineering & Knowledge Engineering , 2008, pp. 54–59.

49. M.S. Uddin, C.K. Roy, K.A. Schneider, and A. Hindle, “On the effectiveness of Simhash for detecting near-miss clones in large scale software systems,” in 2011 18th Working Conference on Reverse Engineering . IEEE, 2011, pp. 13–22.

50. V. Saini, F. Farmahinifarahani, Y. Lu, P. Baldi, and C.V. Lopes, “Oreo: Detection of clones in the twilight zone,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2018, pp. 354–365.

51. A. Schäfer, W. Amme, and T.S. Heinze, “Stubber: Compiling source code into bytecode without dependencies for Java code clone detection,” in 2021 IEEE 15th International Workshop on Software Clones (IWSC) . IEEE, 2021, pp. 29–35.

52. W. Wang, Z. Deng, Y. Xue, and Y. Xu, “CCStokener: Fast yet accurate code clone detection with semantic token,” Journal of Systems and Software , Vol. 199, 2023, p. 111618.

53. T. Lavoie and E. Merlo, “Automated type-3 clone oracle using Levenshtein metric,” in Proceedings of the 5th International Workshop on Software Clones , 2011, pp. 34–40.

54. M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, “Deep learning code fragments for code clone detection,” in Proceedings of the 31st IEEE/ACM international conference on automated software engineering , 2016, pp. 87–98.

55. L. Li, H. Feng, W. Zhuang, N. Meng, and B. Ryder, “CCLearner: A deep learning-based clone detection approach,” in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 2017, pp. 249–260.

56. Y. Gao, Z. Wang, S. Liu, L. Yang, W. Sang et al., “TECCD: A tree embedding approach for code clone detection,” in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 2019, pp. 145–156.

57. C. Ragkhitwetsagul, J. Krinke, and D. Clark, “A comparison of code similarity analysers,” Empirical Software Engineering , Vol. 23, 2018, pp. 2464–2519.

58. A. Walker, T. Cerny, and E. Song, “Open-source tools and benchmarks for code-clone detection: past, present, and future trends,” ACM SIGAPP Applied Computing Review , Vol. 19, No. 4, 2020, pp. 28–39.

59. M. Zakeri-Nasrabadi, S. Parsa, M. Ramezani, C. Roy, and M. Ekhtiarzadeh, “A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges,” Journal of Systems and Software , 2023, p. 111796.

Design © 2015-2026 by e-Informatyka.pl

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.