e-Informatica Software Engineering Journal A Three Dimensional Empirical Study of Logging Questions From Six Popular Q&A Websites

A Three Dimensional Empirical Study of Logging Questions From Six Popular Q&A Websites

[1]Harshit Gujral, Abhinav Sharma, Sangeeta Lal and Lov Kumar, "A Three Dimensional Empirical Study of Logging Questions From Six Popular Q&A Websites", In e-Informatica Software Engineering Journal, vol. 13, no. 1, pp. 105–139, 2019. DOI: 10.5277/e-Inf190104.

Get article (PDF)View article entry (BibTeX)


Harshit Gujral, Abhinav Sharma, Sangeeta Lal, Lov Kumar


Background: Q&A websites such as StackOverflow or Serverfault provide an open platform for users to ask questions and to get help from experts present worldwide. These websites not only help users by answering their questions but also act as a knowledge base. These data present on these websites can be mined to extract valuable information that can benefit the software practitioners. Software engineering research community has already understood the potential benefits of mining data from Q&A websites and several research studies have already been conducted in this area.
Aim: The aim of the study presented in this paper is to perform an empirical analysis of logging questions from six popular Q&A websites.
Method: We perform statistical, programming language and content analysis of logging questions. Our analysis helped us to gain insight about the logging discussion happening in six different domains of the StackExchange websites.
Results: Our analysis provides insight about the logging issues of software practitioners: logging questions are pervasive in all the Q&A websites, the mean time to get accepted answer for logging questions on SU and SF websites are much higher as compared to other websites, a large number of logging question invite a great amount of discussion in the SoftwareEngineering Q&A website, most of the logging issues occur in C++ and Java, the trend for number of logging questions is increasing for Java, Python, and Javascript, whereas, it is decreasing or constant for C, C++, C#, for the ServerFault and Superuser website `C’ is the dominant programming language.


classification, debugging, ensemble, logging, machine learning, source code analysis, tracing


[1]    Q. Fu, J.G. Lou, Y. Wang, and J. Li, “Execution anomaly detection in distributed systems through unstructured log analysis,” in Proceedings of the Ninth IEEE International Conference on Data Mining, ICDM ’09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 149–158.

[2]    K. Nagaraj, C. Killian, and J. Neville, “Structured comparative analysis of systems logs to diagnose performance problems,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, 2012, pp. 26–26.

[3]    S. Lal and A. Sureka, “LogOpt: Static feature extraction from source code for automated catch block logging prediction,” in Proceedings of the 9th India Software Engineering Conference (ISEC), 2016, pp. 151–155.

[4]    S. Lal, N. Sardana, and A. Sureka, “LogOptPlus: Learning to optimize logging in catch and if programming constructs,” in Proceedings of the IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 1, June 2016, pp. 215–220.

[5]    H. Li, W. Shang, and A.E. Hassan, “Which log level should developers choose for a new logging statement?” Empirical Software Engineering, Vol. 22, No. 4, 2017, pp. 1684–1716.

[6]    S. Kabinna, C.P. Bezemer, W. Shang, and A.E. Hassan, “Logging library migrations: A case study for the Apache Software Foundation projects,” in Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16. New York, NY, USA: ACM, 2016, pp. 154–164.

[7]    S. Lal, N. Sardana, and A. Sureka, “Improving logging prediction on imbalanced datasets: A case study on open source java projects,” International Journal of Open Source Software and Processes (IJOSSP), Vol. 7, No. 2, 2016, pp. 43–71.

[8]    StackExchange Community, StackOverflow home page. [Online]. {https://stackoverflow.com/} [accessed: 26.12.2017].

[9]    StackExchange Community, Serverfualt stack exchange home. [Online]. {https://serverfault.com/} [accessed: 26.12.2017].

[10]    StackExchange Community, Superuser Stack Exchange home page. [Online]. {https://superuser.com/} [accessed: 26.12.2017].

[11]    StackExchange Community, Database Administrators Stack Exchange home page. [Online]. {https://dba.stackexchange.com/} [accessed: 26.12.2017].

[12]    StackExchange Community, Android Enthusiasts home page. [Online]. {https://android.stackexchange.com/} [accessed: 26.12.2017].

[13]    StackExchange Community, SoftwareEngineering home page. [Online]. {https://softwareengineering.stackexchange.com/} [accessed: 26.12.2017].

[14]    G. Pinto, F. Castor, and Y.D. Liu, “Mining questions about software energy consumption,” in Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014, pp. 22–31.

[15]    M. Linares-Vásquez, B. Dit, and D. Poshyvanyk, “An exploratory analysis of mobile development issues using Stack Overflow,” in Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE Press, 2013, pp. 93–96.

[16]    A. Barua, S.W. Thomas, and A.E. Hassan, “What are developers talking about? an analysis of topics and trends in Stack Overflow,” Empirical Software Engineering, Vol. 19, No. 3, 2014, pp. 619–654.

[17]    B. Chen and Z.M.J. Jiang, “Characterizing logging practices in Java-based open source software projects – A replication study in Apache Software Foundation,” Empirical Software Engineering, Vol. 22, No. 1, 2017, pp. 330–374.

[18]    Q. Fu, J. Zhu, W. Hu, J.G. Lou, R. Ding, Q. Lin, D. Zhang, and T. Xie, “Where do developers log? An empirical study on logging practices in industry,” in Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion, 2014, pp. 24–33.

[19]    S. Lal, N. Sardana, and A. Sureka, “Two level empirical study of logging statements in open source Java projects,” International Journal of Open Source Software and Processes (IJOSSP), Vol. 6, No. 1, 2015, pp. 49–73.

[20]    W. Shang, M. Nagappan, and A.E. Hassan, “Studying the relationship between logging characteristics and the code quality of platform software,” Empirical Software Engineering, Vol. 20, No. 1, 2015, pp. 1–27.

[21]    D. Yuan, S. Park, and Y. Zhou, “Characterizing logging practices in open-source software,” in Proceedings of the 34th International Conference on Software Engineering, (ICSE), 2012, pp. 102–112.

[22]    H. Li, W. Shang, Y. Zou, and A.E. Hassan, “Towards just-in-time suggestions for log changes,” Empirical Software Engineering, Vol. 22, No. 4, 2017, pp. 1831–1865.

[23]    D. Yuan, S. Park, P. Huang, Y. Liu, M.M. Lee, X. Tang, Y. Zhou, and S. Savage, “Be conservative: Enhancing failure diagnosis with proactive logging,” in Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI), 2012, pp. 293–306. [Online]. http://dl.acm.org/citation.cfm?id=2387880.2387909

[24]    J. Zhu, P. He, Q. Fu, H. Zhang, M. Lyu, and D. Zhang, “Learning to log: Helping developers make informed logging decisions,” in Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), Vol. 1, May 2015, pp. 415–425.

[25]    S. Kabinna, C.P. Bezemer, W. Shang, and A.E. Hassan, “Examining the stability of logging statements,” in Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2016, pp. 326–337.

[26]    S. Lal, N. Sardana, and A. Sureka, “ECLogger: Cross-project catch-block logging prediction using ensemble of classifiers,” e-Informatica Software Engineering Journal, Vol. 11, No. 1, 2017, pp. 9–40.

[27]    S. Beyer and M. Pinzger, “A manual categorization of android app development issues on Stack Overflow,” in Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. IEEE, 2014, pp. 531–535.

[28]    X.L. Yang, D. Lo, X. Xia, Z.Y. Wan, and J.L. Sun, “What security questions do developers ask? A large-scale study of Stack Overflow posts,” Journal of Computer Science and Technology, Vol. 31, No. 5, 2016, pp. 910–924.

[29]    H. Malik, P. Zhao, and M. Godfrey, “Going green: An exploratory analysis of energy-related questions,” in Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, 2015, pp. 418–421.

[30]    C. Nagy and A. Cleve, “Mining Stack Overflow for discovering error patterns in SQL queries,” in Software Maintenance and Evolution (ICSME). IEEE, 2015, pp. 516–520.

[31]    StackExchange Community, StackExchange. [Online]. {https://stackexchange.com/} [accessed: 26.12.2017].

[32]    Quora Community, Quora Home Page. [Online]. {https://www.quora.com/} [accessed: 26.12.2017].

[33]    StackExchange Community, What does it mean when an answer is “accepted”. [Online]. {https://stackoverflow.com/help/accepted-answer} [accessed: 26.12.2017].

[34]    Python Community, Latent Dirichlet Allocation (LDA) in Python. [Online]. {https://radimrehurek.com/gensim/models/ldamodel.html} [accessed: 9.04.2018].

[35]    J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.

[36]    Neurobs, Neurobs. [Online]. https://www.neurobs.com/pres_docs/html/03_presentation/07_data_reporting/01_logfiles/index.html [accessed: 12.03.2018].

[37]    Wikipedia, 4th Dimension (software). [Online]. {https://en.wikipedia.org/wiki/4th_Dimension_(software)} [accessed: 12.03.2018].

[38]    PostgreSQL, Warm Standby Servers for High Availability. [Online]. {http://www.postgresql.org/docs/8.2/static/warm-standby.html} [accessed: 12.03.2018].

[39]     MySQL Community, Reference Manual on Configuring Replication. [Online]. {https://dev.mysql.com/doc/refman/5.7/en/replication-configuration.html} [accessed: 12.03.2018].

[40]    Network Working Group, The Syslog Protocol. [Online]. {https://tools.ietf.org/html/rfc5424} [accessed: 12.03.2018].

[41]    Rsyslog Community, Rsyslog. [Online]. {https://www.rsyslog.com/} [accessed: 12.03.2018].

[42]    Syslog-ng Community, Reliable, scalable, secure central log management. [Online]. {https://syslog-ng.com/} [accessed: 12.03.2018].

[43]    Python Community, syslogd – Linux man page. [Online]. {https://linux.die.net/man/8/syslogd} [accessed: 12.03.2018].

[44]    Techopedia, Error Log. [Online]. {https://www.techopedia.com/definition/26306/error-log} [accessed: 12.03.2018].

[45]    T.A. Peters, “The history and development of transaction log analysis,” Library Hi Tech, Vol. 11, No. 2, 1993, pp. 41–66.

[46]    MariaDB Community, Binary Log. [Online]. {https://mariadb.com/kb/en/library/binary-log/} [accessed: 12.03.2018].

[47]    StackOverflow Community, Graylog. [Online]. https://stackoverflow.com/tags/graylog/info [accessed: 3.05.2018].

[48]    StackOverflow Community, NXLOG. [Online]. {https://stackoverflow.com/tags/nxlog/info} [accessed: 3.05.2018].

[49]    archlinux, Logwatch. [Online]. {https://wiki.archlinux.org/index.php/Logwatch} [accessed: 3.05.2018].

[50]    Bjorn, F. Crawford, J. Pyeron, J. Soref, K. Bauer, M. Tremaine, O. Poplawski, and S. Jakobs, Logwatch. [Online]. https://sourceforge.net/p/logwatch/wiki/Home/ [accessed: 12.03.2018].

[51]    MySQL Community, mysqlbinlog – Utliity for Processing Binary Log Files. [Online]. {https://logging.apache.org/log4j/2.x/} [accessed: 12.03.2018].

[52]    Wikitech, Logstash – Wikitech. [Online]. {https://wikitech.wikimedia.org/wiki/Logstash} [accessed: 12.03.2018].

[53]    A. Hindle, C. Bird, T. Zimmermann, and N. Nagappan, “Do topics make sense to managers and developers?” Empirical Software Engineering, Vol. 20, No. 2, 2015, pp. 479–515.

©2015 e-Informatyka.pl, All rights reserved.

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.