Recommendations for analysing and meta-analysing small sample size software engineering experiments

Kitchenham, Barbara; Madeyski, Lech

doi:10.1007/s10664-024-10504-1

Recommendations for analysing and meta-analysing small sample size software engineering experiments

Kitchenham, Barbara; Madeyski, Lech

Authors

Barbara Kitchenham

Lech Madeyski

Abstract

Context: Software engineering (SE) experiments often have small sample sizes. This can result in data sets with non-normal characteristics, which poses problems as standard parametric meta-analysis, using the standardized mean difference (StdMD) effect size, assumes normally distributed sample data. Small sample sizes and non-normal data set characteristics can also lead to unreliable estimates of parametric effect sizes. Meta-analysis is even more complicated if experiments use complex experimental designs, such as two-group and four-group cross-over designs, which are popular in SE experiments. Objective: Our objective was to develop a validated and robust meta-analysis method that can help to address the problems of small sample sizes and complex experimental designs without relying upon data samples being normally distributed. Method: To illustrate the challenges, we used real SE data sets. We built upon previous research and developed a robust meta-analysis method able to deal with challenges typical for SE experiments. We validated our method via simulations comparing StdMD with two robust alternatives: the probability of superiority (p^) and Cliffs’ d. Results: We confirmed that many SE data sets are small and that small experiments run the risk of exhibiting non-normal properties, which can cause problems for analysing families of experiments. For simulations of individual experiments and meta-analyses of families of experiments, p^ and Cliff’s d consistently outperformed StdMD in terms of negligible small sample bias. They also had better power for log-normal and Laplace samples, although lower power for normal and gamma samples. Tests based on p^ always had better or equal power than tests based on Cliff’s d, and across all but one simulation condition, p^ Type 1 error rates were less biased. Conclusions: Using p^ is a low-risk option for analysing and meta-analysing data from small sample-size SE randomized experiments. Parametric methods are only preferable if you have prior knowledge of the data distribution.

Citation

Kitchenham, B., & Madeyski, L. (2024). Recommendations for analysing and meta-analysing small sample size software engineering experiments. Empirical Software Engineering, 29(6), Article 137. https://doi.org/10.1007/s10664-024-10504-1

Journal Article Type	Article
Acceptance Date	May 24, 2024
Online Publication Date	Aug 17, 2024
Publication Date	2024-11
Deposit Date	Aug 23, 2024
Publicly Available Date	Aug 27, 2024
Journal	Empirical Software Engineering
Print ISSN	1382-3256
Electronic ISSN	1573-7616
Publisher	Springer
Peer Reviewed	Peer Reviewed
Volume	29
Issue	6
Article Number	137
DOI	https://doi.org/10.1007/s10664-024-10504-1
Keywords	Meta-analysis · Effect size · Non-parametric · Probability of superiority · Small sample sizes · Reproducible research
Public URL	https://keele-repository.worktribe.com/output/887834
Additional Information	Accepted: 24 May 2024; First Online: 17 August 2024; : ; : The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Files

Recommendations for analysing and meta-analysing small sample size software engineering experiments (1 Mb)
Archive

Licence
https://creativecommons.org/licenses/by/4.0/

Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/

Copyright Statement
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.