Cargando…

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review

BACKGROUND: Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS: We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model u...

Descripción completa

Detalles Bibliográficos
Autores principales: Dhiman, Paula, Ma, Jie, Andaur Navarro, Constanza L., Speich, Benjamin, Bullock, Garrett, Damen, Johanna A. A., Hooft, Lotty, Kirtley, Shona, Riley, Richard D., Van Calster, Ben, Moons, Karel G. M., Collins, Gary S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8991704/
https://www.ncbi.nlm.nih.gov/pubmed/35395724
http://dx.doi.org/10.1186/s12874-022-01577-x
_version_ 1784683626200301568
author Dhiman, Paula
Ma, Jie
Andaur Navarro, Constanza L.
Speich, Benjamin
Bullock, Garrett
Damen, Johanna A. A.
Hooft, Lotty
Kirtley, Shona
Riley, Richard D.
Van Calster, Ben
Moons, Karel G. M.
Collins, Gary S.
author_facet Dhiman, Paula
Ma, Jie
Andaur Navarro, Constanza L.
Speich, Benjamin
Bullock, Garrett
Damen, Johanna A. A.
Hooft, Lotty
Kirtley, Shona
Riley, Richard D.
Van Calster, Ben
Moons, Karel G. M.
Collins, Gary S.
author_sort Dhiman, Paula
collection PubMed
description BACKGROUND: Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS: We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS: Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS: The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01577-x.
format Online
Article
Text
id pubmed-8991704
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89917042022-04-09 Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review Dhiman, Paula Ma, Jie Andaur Navarro, Constanza L. Speich, Benjamin Bullock, Garrett Damen, Johanna A. A. Hooft, Lotty Kirtley, Shona Riley, Richard D. Van Calster, Ben Moons, Karel G. M. Collins, Gary S. BMC Med Res Methodol Research BACKGROUND: Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS: We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS: Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS: The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01577-x. BioMed Central 2022-04-08 /pmc/articles/PMC8991704/ /pubmed/35395724 http://dx.doi.org/10.1186/s12874-022-01577-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dhiman, Paula
Ma, Jie
Andaur Navarro, Constanza L.
Speich, Benjamin
Bullock, Garrett
Damen, Johanna A. A.
Hooft, Lotty
Kirtley, Shona
Riley, Richard D.
Van Calster, Ben
Moons, Karel G. M.
Collins, Gary S.
Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
title Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
title_full Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
title_fullStr Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
title_full_unstemmed Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
title_short Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
title_sort methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8991704/
https://www.ncbi.nlm.nih.gov/pubmed/35395724
http://dx.doi.org/10.1186/s12874-022-01577-x
work_keys_str_mv AT dhimanpaula methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT majie methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT andaurnavarroconstanzal methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT speichbenjamin methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT bullockgarrett methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT damenjohannaaa methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT hooftlotty methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT kirtleyshona methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT rileyrichardd methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT vancalsterben methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT moonskarelgm methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview
AT collinsgarys methodologicalconductofprognosticpredictionmodelsdevelopedusingmachinelearninginoncologyasystematicreview