e-Informatica Software Engineering Journal Empirical Study of the Evolution of Python Questions on Stack Overflow

Empirical Study of the Evolution of Python Questions on Stack Overflow

[1]Gopika Syam, Sangeeta Lal and Tao Chen, "Empirical Study of the Evolution of Python Questions on Stack Overflow", In e-Informatica Software Engineering Journal, vol. 17, no. 1, pp. 230107, 2023. DOI: 10.37190/e-Inf230107.

Download article (PDF)Get article BibTeX file


Gopika Syam, Sangeeta Lal, Tao Chen


Background: Python is a popular and easy-to-use programming language. It is constantly expanding, with new features and libraries being introduced daily for a broad range of applications. This dynamic expansion needs a robust support structure for developers to effectively utilise the language.

Aim: In this study we conduct an in-depth analysis focusing on several research topics to understand the theme of Python questions and identify the challenges that developers encounter, using the questions posted on Stack Overflow.

Method:We perform a quantitative and qualitative analysis of Python questions in Stack Overflow. Topic Modelling is also used to determine the most popular and difficult topics among developers.

Results: The findings of this study revealed a recent surge in questions about scientific computing libraries pandas and TensorFlow. Also, we observed that the discussion of Data Structures and Formats is more popular in the Python community, whereas areas such as Installation, Deployment, and IDE are still challenging.

Conclusion: This study can direct the research and development community to put more emphasis on tackling the actual issues that Python programmers are facing.


Python programming, Software Development, Stack Overflow, Topic Modelling


1. M. Lutz, Programming python. ” O’Reilly Media, Inc.”, 2001.

2. K. Chowdhary, “On the evolution of programming languages,” arXiv preprint arXiv:2007.02699, 2020.

3. H. Tahmooresi, A. Heydarnoori, and A. Aghamohammadi, “An analysis of python’s topics, trends, and technologies through mining stack overflow discussions,” arXiv preprint arXiv:2004.06280, 2020.

4. B.A. Malloy and J.F. Power, “An empirical analysis of the transition from python 2 to python 3,” Empirical Software Engineering, Vol. 24, No. 2, 2019, pp. 751–778.

5. Z. Zhang, H. Zhu, M. Wen, Y. Tao, Y. Liu et al., “How do python framework apis evolve? an exploratory study,” in 2020 ieee 27th international conference on software analysis, evolution and reengineering (saner). IEEE, 2020, pp. 81–92.

6. R. Widyasari, S.Q. Sim, C. Lok, H. Qi, J. Phan et al., “Bugsinpy: A database of existing bugs in python programs to enable controlled testing and debugging studies,” in Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2020, pp. 1556–1560.

7. A. Tashakkori, C. Teddlie, and C.B. Teddlie, Mixed methodology: Combining qualitative and quantitative approaches, Vol. 46. sage, 1998.

8. A. Peruma, S. Simmons, E.A. AlOmar, C.D. Newman, M.W. Mkaouer et al., “How do i refactor this? an empirical study on refactoring trends and topics in stack overflow,” Empirical Software Engineering, Vol. 27, No. 1, 2022, pp. 1–43.

9. K. Georgiou, N. Mittas, A. Chatzigeorgiou, and L. Angelis, “An empirical study of covid-19 related posts on stack overflow: Topics and technologies,” Journal of Systems and Software, Vol. 182, 2021, p. 111089.

10. G. Pinto, F. Castor, and Y.D. Liu, “Mining questions about software energy consumption,” in Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 22–31.

11. G. Blanco, R. Pérez-López, F. Fdez-Riverola, and A.M.G. Lourenço, “Understanding the social evolution of the java community in stack overflow: A 10-year study of developer interactions,” Future Generation Computer Systems, Vol. 105, 2020, pp. 446–454.

12. H. Li, F. Khomh, M. Openja et al., “Understanding quantum software engineering challenges an empirical study on stack exchange forums and github issues,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2021, pp. 343–354.

13. A. Abdellatif, D. Costa, K. Badran, R. Abdalkareem, and E. Shihab, “Challenges in chatbot development: A study of stack overflow posts,” in Proceedings of the 17th international conference on mining software repositories, 2020, pp. 174–185.

14. W. McKinney et al., “pandas: a foundational python library for data analysis and statistics,” Python for high performance and scientific computing, Vol. 14, No. 9, 2011, pp. 1–9.

15. C. Jacobi, W. Van Atteveldt, and K. Welbers, “Quantitative analysis of large amounts of journalistic texts using topic modelling,” Digital journalism, Vol. 4, No. 1, 2016, pp. 89–106.

16. M. Honnibal and I. Montani, “spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing,” To appear, Vol. 7, No. 1, 2017, pp. 411–420.

17. Y. Zhang, R. Jin, and Z.H. Zhou, “Understanding bag-of-words model: a statistical framework,” International journal of machine learning and cybernetics, Vol. 1, No. 1, 2010, pp. 43–52.

18. D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, Vol. 3, No. Jan, 2003, pp. 993–1022.

19. R.H. Ali and E. Linstead, “Modeling topic exhaustion for programming languages on stackoverflow.” in SEKE, 2020, pp. 400–405.

20. H. Gujral, A. Sharma, S. Lal, and L. Kumar, “A three dimensional empirical study of logging questions from six popular q & a websites,” e-Informatica Software Engineering Journal, Vol. 13, No. 1, 2019.

21. M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence measures,” in Proceedings of the eighth ACM international conference on Web search and data mining, 2015, pp. 399–408.

22. S.K.S. Joy, F. Ahmed, A.H. Mahamud, and N.C. Mandal, “An empirical studies on how the developers discussed about pandas topics,” arXiv preprint arXiv:2210.03519, 2022.

23. J.C. Westland, “The cost of errors in software development: evidence from industry,” Journal of Systems and Software, Vol. 62, No. 1, 2002, pp. 1–9.

24. S. Ahmed and M. Bagherzadeh, “What do concurrency developers ask about? a large-scale study using stack overflow,” in Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement, 2018, pp. 1–10.

25. K. Bajaj, K. Pattabiraman, and A. Mesbah, “Mining questions asked by web developers,” in Proceedings of the 11th Working conference on mining software repositories, 2014, pp. 112–121.

26. M. Bagherzadeh and R. Khatchadourian, “Going big: a large-scale study on what big data developers ask,” in Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2019, pp. 432–442.

27. S. Nadi, S. Krüger, M. Mezini, and E. Bodden, “Jumping through hoops: Why do java developers struggle with cryptography apis?” in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 935–946.

28. X.L. Yang, D. Lo, X. Xia, Z.Y. Wan, and J.L. Sun, “What security questions do developers ask? a large-scale study of stack overflow posts,” Journal of Computer Science and Technology, Vol. 31, No. 5, 2016, pp. 910–924.

29. C. Rosen and E. Shihab, “What are mobile developers asking about? a large scale study using stack overflow,” Empirical Software Engineering, Vol. 21, No. 3, 2016, pp. 1192–1223.

30. A. Malakhov, D. Liu, A. Gorshkov, and T. Wilmarth, “Composable multi-threading and multi-processing for numeric libraries,” in Proceedings of the 17th Python in Science Conference, Austin, TX, USA, 2018, pp. 9–15.

31. Q. Nguyen, Mastering Concurrency in Python: Create faster programs using concurrency, asynchronous, multithreading, and parallel programming. Packt Publishing Ltd, 2018.

32. C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell et al., Experimentation in software engineering. Springer Science & Business Media, 2012.

33. H. Gujral, S. Lal, and H. Li, “An exploratory semantic analysis of logging questions,” Journal of Software: Evolution and Process, Vol. 33, No. 7, 2021, p. e2361.

34. M. Alshangiti, H. Sapkota, P.K. Murukannaiah, X. Liu, and Q. Yu, “Why is developing machine learning applications challenging? a study on stack overflow posts,” in 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 2019, pp. 1–11.

35. P. Chakraborty, R. Shahriyar, A. Iqbal, and G. Uddin, “How do developers discuss and support new programming languages in technical q&a site? an empirical study of go, swift, and rust in stack overflow,” Information and Software Technology, Vol. 137, 2021, p. 106603.

36. L. Lord, J. Sell, F. Bagirov, and M. Newman, “Survival analysis within stack overflow: Python and r,” in 2018 4th International Conference on Big Data Innovations and Applications (Innovate-Data). IEEE Computer Society, 2018, pp. 51–59.

37. Y. Peng, Y. Zhang, and M. Hu, “An empirical study for common language features used in python projects,” in 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2021, pp. 24–35.

38. A. Derezińska and K. Hałas, “Analysis of mutation operators for the python language,” in Proceedings of the Ninth International Conference on Dependability and Complex Systems DepCoS-RELCOMEX. June 30–July 4, 2014, Brunów, Poland. Springer, 2014, pp. 155–164.

39. M. Ismail and G.E. Suh, “Quantitative overhead analysis for python,” in 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2018, pp. 36–47.

©2015 e-Informatyka.pl, All rights reserved.

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.