Cargando…

Adaptive sample size determination for the development of clinical prediction models

BACKGROUND: We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. METHODS: We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) an...

Descripción completa

Detalles Bibliográficos
Autores principales: Christodoulou, Evangelia, van Smeden, Maarten, Edlinger, Michael, Timmerman, Dirk, Wanitschek, Maria, Steyerberg, Ewout W., Van Calster, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7983402/
https://www.ncbi.nlm.nih.gov/pubmed/33745449
http://dx.doi.org/10.1186/s41512-021-00096-5
_version_ 1783667898436288512
author Christodoulou, Evangelia
van Smeden, Maarten
Edlinger, Michael
Timmerman, Dirk
Wanitschek, Maria
Steyerberg, Ewout W.
Van Calster, Ben
author_facet Christodoulou, Evangelia
van Smeden, Maarten
Edlinger, Michael
Timmerman, Dirk
Wanitschek, Maria
Steyerberg, Ewout W.
Van Calster, Ben
author_sort Christodoulou, Evangelia
collection PubMed
description BACKGROUND: We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. METHODS: We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction). RESULTS: Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. CONCLUSIONS: Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s41512-021-00096-5.
format Online
Article
Text
id pubmed-7983402
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79834022021-03-22 Adaptive sample size determination for the development of clinical prediction models Christodoulou, Evangelia van Smeden, Maarten Edlinger, Michael Timmerman, Dirk Wanitschek, Maria Steyerberg, Ewout W. Van Calster, Ben Diagn Progn Res Research BACKGROUND: We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. METHODS: We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction). RESULTS: Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. CONCLUSIONS: Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s41512-021-00096-5. BioMed Central 2021-03-22 /pmc/articles/PMC7983402/ /pubmed/33745449 http://dx.doi.org/10.1186/s41512-021-00096-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Research
Christodoulou, Evangelia
van Smeden, Maarten
Edlinger, Michael
Timmerman, Dirk
Wanitschek, Maria
Steyerberg, Ewout W.
Van Calster, Ben
Adaptive sample size determination for the development of clinical prediction models
title Adaptive sample size determination for the development of clinical prediction models
title_full Adaptive sample size determination for the development of clinical prediction models
title_fullStr Adaptive sample size determination for the development of clinical prediction models
title_full_unstemmed Adaptive sample size determination for the development of clinical prediction models
title_short Adaptive sample size determination for the development of clinical prediction models
title_sort adaptive sample size determination for the development of clinical prediction models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7983402/
https://www.ncbi.nlm.nih.gov/pubmed/33745449
http://dx.doi.org/10.1186/s41512-021-00096-5
work_keys_str_mv AT christodoulouevangelia adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
AT vansmedenmaarten adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
AT edlingermichael adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
AT timmermandirk adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
AT wanitschekmaria adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
AT steyerbergewoutw adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
AT vancalsterben adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels