Cargando…

Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance

Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at reco...

Descripción completa

Detalles Bibliográficos
Autores principales: Martin, Glen P, Riley, Richard D, Collins, Gary S, Sperrin, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8649413/
https://www.ncbi.nlm.nih.gov/pubmed/34623193
http://dx.doi.org/10.1177/09622802211046388
_version_ 1784610990434811904
author Martin, Glen P
Riley, Richard D
Collins, Gary S
Sperrin, Matthew
author_facet Martin, Glen P
Riley, Richard D
Collins, Gary S
Sperrin, Matthew
author_sort Martin, Glen P
collection PubMed
description Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development.
format Online
Article
Text
id pubmed-8649413
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-86494132021-12-08 Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance Martin, Glen P Riley, Richard D Collins, Gary S Sperrin, Matthew Stat Methods Med Res Original Research Articles Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development. SAGE Publications 2021-10-08 2021-12 /pmc/articles/PMC8649413/ /pubmed/34623193 http://dx.doi.org/10.1177/09622802211046388 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research Articles
Martin, Glen P
Riley, Richard D
Collins, Gary S
Sperrin, Matthew
Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance
title Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_full Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_fullStr Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_full_unstemmed Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_short Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance
title_sort developing clinical prediction models when adhering to minimum sample size recommendations: the importance of quantifying bootstrap variability in tuning parameters and predictive performance
topic Original Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8649413/
https://www.ncbi.nlm.nih.gov/pubmed/34623193
http://dx.doi.org/10.1177/09622802211046388
work_keys_str_mv AT martinglenp developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance
AT rileyrichardd developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance
AT collinsgarys developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance
AT sperrinmatthew developingclinicalpredictionmodelswhenadheringtominimumsamplesizerecommendationstheimportanceofquantifyingbootstrapvariabilityintuningparametersandpredictiveperformance