e-Informatica Software Engineering Journal Efficiency of Software Testing Techniques: A Controlled Experiment Replication and Network Meta-analysis

Efficiency of Software Testing Techniques: A Controlled Experiment Replication and Network Meta-analysis


Omar S. Gómez, Karen Cortés-Verdín, César J. Pardo


Background. Common approaches to software verification include static testing techniques, such as code reading, and dynamic testing techniques, such as black-box and white-box testing. Objective. With the aim of gaining a better understanding of software testing techniques, a controlled experiment replication and the synthesis of previous experiments which examine the efficiency of code reading, black-box and white-box testing techniques were conducted. Method. The replication reported here is composed of four experiments in which instrumented programs were used. Participants randomly applied one of the techniques to one of the instrumented programs. The outcomes were synthesized with seven experiments using the method of network meta-analysis (NMA). Results. No significant differences in the efficiency of the techniques were observed. However, it was discovered the instrumented programs had a significant effect on the efficiency. The NMA results suggest that the black-box and white-box techniques behave alike; and the efficiency of code reading seems to be sensitive to other factors. Conclusion. Taking into account these findings, the Authors suggest that prior to carrying out software verification activities, software engineers should have a clear understanding of the software product to be verified; they can apply either black-box or white-box testing techniques as they yield similar defect detection rates.


  1. S. McConnell, Code Complete, 2nd ed. Redmond,WA, USA: Microsoft Press, 2004.
  2. G.E.P. Box, W.G. Hunter, J.S. Hunter, and W.G.Hunter, Statistics for Experimenters: An Introductionto Design, Data Analysis, and ModelBuilding. John Wiley & Sons, Jun. 1978.
  3. R. Kuehl, Design of Experiments: StatisticalPrinciples of Research Design and Analysis,2nd ed. California, USA: Duxbury ThomsonLearning, 2000.
  4. N. Juristo and A.M. Moreno, Basics of SoftwareEngineering Experimentation. Kluwer AcademicPublishers, 2001.
  5. V. Basili and R. Selby, “Comparing the effectivenessof software testing strategies,” IEEETrans. Softw. Eng., Vol. 13, No. 12, 1987, pp.1278–1296.
  6. E. Kamsties and C.M. Lott, “An empiricalevaluation of three defect-detection techniques,”in Proceedings of the 5th EuropeanSoftware Engineering Conference. London, UK:Springer-Verlag, 1995, pp. 362–383.
  7. E. Kamsties and C. Lott, “An empirical evaluationof three defect detection techniques,” Dept.Computer Science, University of Kaiserslautern,Kaiserslautern, Germany, Tech. Rep. ISERN95-02, 1995.
  8. M. Roper, M. Wood, and J. Miller, “An empiricalevaluation of defect detection techniques,”Information and Software Technology, Vol. 39,No. 11, 1997, pp. 763–775.
  9. N. Juristo and S. Vegas, “Functional testing,structural testing and code reading: What faulttype do they each detect?” in Empirical Methodsand Studies in Software Engineering, ser. LectureNotes in Computer Science, R. Conradi andA. Wang, Eds. Springer Berlin/Heidelberg, 2003,Vol. 2765, pp. 208–232.
  10. N. Juristo, S. Vegas, M. Solari, S. Abrahao,and I. Ramos, “Comparing the effectiveness ofequivalence partitioning, branch testing and codereading by stepwise abstraction applied by subjects,”in IEEE Fifth International Conferenceon Software Testing, Verification and Validation(ICST), Apr. 2012, pp. 330–339.
  11. S.U. Farooq and S. Quadri, “An externally replicatedexperiment to evaluate software testingmethods,” in Proceedings of the 17th InternationalConference on Evaluation and Assessmentin Software Engineering, ser. EASE ’13. NewYork, NY, USA: ACM, 2013, pp. 72–77.
  12. O.S. Gómez, R.A. Aguilar, and J.P. Ucán, “Efectividadde técnicas de prueba de software aplicadaspor sujetos novicios de pregado,” in EncuentroNacional de Ciencias de la Computación,(ENC), M.D. Rodríguez, A.I. Martínez, and J.P.García, Eds., Ocotlán de Morelos, Oaxaca, México,Nov. 2014, iSBN:9780990823605.
  13. O.S. Gómez, N. Juristo, and S. Vegas, “Understandingreplication of experiments in softwareengineering: A classification,” Information andSoftware Technology, Vol. 56, No. 8, 2014, pp.1033–1048.
  14. N. Juristo and O.S. Gómez, “Replication ofsoftware engineering experiments,” in EmpiricalSoftware Engineering and Verification: LASERSummer School 2008–2010, ser. Lecture Notes inComputer Science, B. Meyer and M. Nordio, Eds.Berlin: Springer-Verlag, Nov. 2011, Vol. 7007, pp.60–88.
  15. O.S. Gómez, “Tipología de replicaciones parala síntesis de experimentos en ingeniería delsoftware,” Ph.D. dissertation, Facultad de Informáticade la Universidad Politécnica de Madrid,Campus de Montegancedo, 28660, Boadilla delMonte, Madrid, España, May 2012.
  16. D. Sjøberg, J. Hannay, O. Hansen, V. Kampenes,A. Karahasanovic, N.K. Liborg, and A. Rekdal,“A survey of controlled experiments in softwareengineering,” Software Engineering, IEEE Transactionson, Vol. 31, No. 9, Sep. 2005, pp. 733–753.
  17. F. da Silva, M. Suassuna, A. França, A. Grubb,T. Gouveia, C. Monteiro, and I. dos Santos,“Replication of empirical studies in software engineeringresearch: A systematic mapping study,”Empirical Software Engineering, Vol. 19, No. 3,2014, pp. 501–557.
  18. R.C. Linger, B.I. Witt, and H.D. Mills, StructuredProgramming; Theory and Practice the SystemsProgramming Series. Boston, MA, USA:Addison-Wesley Longman Publishing Co., Inc.,1979.
  19. J. Carver, “Towards reporting guidelines for experimentalreplications: A proposal,” in Proceedingsof the 1st International Workshop onReplication in Empirical Software EngineeringResearch (RESER), Cape Town, South Africa.,May 2010.
  20. W. Howden, “Functional program testing,”IEEE Transactions on Software Engineering,Vol. 6, 1980, pp. 162–169.
  21. G.J. Myers, The Art of Software Testing. NewYork: John Wiley & Sons, 1979.
  22. B. Marick, The craft of software testing:subsystem testing including object-based andobject-oriented testing. Upper Saddle River, NJ,USA: Prentice-Hall, Inc., 1995.
  23. B. Beizer, Software testing techniques (2nd ed.).New York, NY, USA: Van Nostrand ReinholdCo., 1990.
  24. H.L. Dreyfus and S. Dreyfus, Mind over Machine.The Power of Human Intuition and Expertisein the Era of the Computer. New York: BasilBlackwell, 1986.
  25. V. Basili and B. Perricone, “Software errors andcomplexity: An empirical investigation,” Commun.ACM, Vol. 27, No. 1, 1984, pp. 42–52.
  26. V. Basili, G. Caldiera, and H. Rombach, “Goalquestion metric paradigm,” Encyclopedia of SoftwareEng, 1994, pp. 528–532, john Wiley & Sons.
  27. P. Louridas and G. Gousios, “A note on rigourand replicability,” SIGSOFT Softw. Eng. Notes,Vol. 37, No. 5, Sep. 2012, pp. 1–4.
  28. O.S. Gómez, N. Juristo, and S. Vegas, “Replication,reproduction and re-analysis: Three waysfor verifying experimental findings,” in InternationalWorkshop on Replication in EmpiricalSoftware Engineering Research (RESER), CapeTown, South Africa, May 2010.
  29. H. Levene, “Robust tests for equality of variances,”in Contributions to probability and statistics,I. Olkin, Ed. Palo Alto, CA: Stanford Univ.Press., 1960, pp. 278–292.
  30. A.N. Kolmogorov, “Sulla determinazione empiricadi una legge di distribuzione,” Giornaledell’Istituto Italiano degli Attuari, Vol. 4, 1933,pp. 83–91.
  31. N.V. Smirnov, “Table for estimating the goodnessof fit of empirical distributions,” Ann. Math.Stat., Vol. 19, 1948, pp. 279–281.
  32. J. Tukey, “Comparing individual means in theanalysis of variance,” Biometrics, Vol. 5, No. 2,1949, pp. 99–114.
  33. J. Cohen, Statistical power analysis for the behavioralsciences. Hillsdale, NJ: L. Erlbaum Associates,1988.
  34. T. Lumley, “Network meta-analysis for indirecttreatment comparisons,” Statistics in Medicine,Vol. 21, No. 16, 2002, pp. 2313–2324.
  35. G. Lu and A.E. Ades, “Combination of directand indirect evidence in mixed treatment comparisons,”Statistics in Medicine, Vol. 23, No. 20,2004, pp. 3105–3124.
  36. T. Greco, G. Biondi-Zoccai, O. Saleh, L. Pasin,L. Cabrini, A. Zangrillo, and G. Landoni, “Theattractiveness of network meta-analysis: A comprehensivesystematic and narrative review,”Heart, Lung and Vessels, Vol. 7, No. 2, 2015,pp. 133–142.
  37. A. Bafeta, L. Trinquart, R. Seror, andP. Ravaud, “Reporting of results from networkmeta-analyses: Methodological systematicreview,” BMJ, Vol. 348, 2014. [Online].http://www.bmj.com/content/348/bmj.g1741
  38. A. Nikolakopoulou, A. Chaimani, A.A. Veroniki,H.S. Vasiliadis, C.H. Schmid, and G. Salanti, “Characteristics of networks of interventions: Adescription of a database of 186 published networks,”PLoS ONE, Vol. 9, No. 1, Dec. 2013, pp.e86 754–.
  39. A. Chaimani and G. Salanti, “Visualizing assumptionsand results in network meta-analysis:The network graphs package,” Stata Journal,Vol. 15, No. 4, 2015, pp. 905–950(46).
  40. G. Rücker, G. Schwarzer, U. Krahn, andJ. König, netmeta: network Meta-Analysisusing Frequentist Methods, 2016, R packageversion 0.9-0. [Online]. https://CRAN.Rproject.org/package=netmeta
  41. L.V. Hedges and I. Olkin, Statistical methods formeta-analysis. Academic Press, Orlando :, 1985.
  42. F. Song, Y.K. Loke, T. Walsh, A.M. Glenny, A.J.Eastwood, and D.G. Altman, “Methodologicalproblems in the use of indirect comparisons forevaluating healthcare interventions: Survey ofpublished systematic reviews,” BMJ, Vol. 338,2009.
  43. J.P.T. Higgins, D. Jackson, J.K. Barrett, G. Lu,A.E. Ades, and I.R. White, “Consistency and inconsistencyin network meta-analysis: Conceptsand models for multi-arm studies,” Research SynthesisMethods, Vol. 3, No. 2, 2012, pp. 98–110.
  44. J.P. Jansen and H. Naci, “Is networkmeta-analysis as valid as standard pairwisemeta-analysis? it all depends on the distributionof effect modifiers,” BMC Medicine, Vol. 11,May 2013, pp. 159–159.
  45. M. Borenstein, L.V. Hedges, J.P. Higgins, andH.R. Rothstein, Introduction to Meta-Analysis.United Kingdom: John Wiley & Sons, Ltd, 2009.
  46. M. Ciolkowski, “What do we know aboutperspective-based reading? an approach for quantitativeaggregation in software engineering,”in Proceedings of the 2009 3rd InternationalSymposium on Empirical Software Engineeringand Measurement, ser. ESEM ’09. Washington,DC, USA: IEEE Computer Society, 2009, pp.133–144.
  47. T. Cook and D. Campbell, The design and conductof quasi-experiments and true experimentsin field settings. Chicago: Rand McNally, 1976.
  48. P. Runeson, “Using students as experimentsubjects–an analysis on graduate and freshmenstudent data,” in Proceedings of the 7th InternationalConference on Empirical Assessmentin Software Engineering, Keele University, UK,2003, pp. 95–102.
[1]Omar S. Gómez, Karen Cortés-Verdín, César J. Pardo, "Efficiency of Software Testing Techniques: A Controlled Experiment Replication and Network Meta-analysis", In e-Informatica Software Engineering Journal, vol. 11, iss. 1, pp. 77-102, 2017. [bibtex] [pdf] [doi]

©2015 e-Informatyka.pl, All rights reserved.

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.