Constanza L. Andaur Navarro
Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models
Navarro, Constanza L. Andaur; Damen, Johanna A. A.; van Smeden, Maarten; Takada, Toshihiko; Nijman, Steven W. J.; Dhiman, Paula; Ma, Jie; Collins, Gary S.; Bajpai, Ram; Riley, Richard D.; Moons, Karel G. M.; Hooft, Lotty
Authors
Johanna A. A. Damen
Maarten van Smeden
Toshihiko Takada
Steven W. J. Nijman
Paula Dhiman
Jie Ma
Gary S. Collins
Dr Ram Bajpai r.bajpai@keele.ac.uk
Richard D. Riley
Karel G. M. Moons
Lotty Hooft
Abstract
Background and Objectives
We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques.
Methods
We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes.
Results
We included 152 studies, 58 (38.2% [95% CI 30.8–46.1]) were diagnostic and 94 (61.8% [95% CI 53.9–69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3–91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8–90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4–87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5–19.9]) and random forest (n = 73/522, 14% [95% CI 11.3–17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4–96.3]).
Conclusion
Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning–based prediction models.
Citation
Navarro, C. L. A., Damen, J. A. A., van Smeden, M., Takada, T., Nijman, S. W. J., Dhiman, P., …Hooft, L. (2023). Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. Journal of Clinical Epidemiology, 154, 8-22. https://doi.org/10.1016/j.jclinepi.2022.11.015
Journal Article Type | Review |
---|---|
Acceptance Date | Nov 22, 2022 |
Online Publication Date | Nov 25, 2022 |
Publication Date | 2023-02 |
Deposit Date | Jun 28, 2023 |
Journal | Journal of Clinical Epidemiology |
Print ISSN | 0895-4356 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 154 |
Pages | 8-22 |
DOI | https://doi.org/10.1016/j.jclinepi.2022.11.015 |
Public URL | https://keele-repository.worktribe.com/output/509479 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S0895435622003006?via%3Dihub |
You might also like
Downloadable Citations
About Keele Repository
Administrator e-mail: research.openaccess@keele.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search