Cargando…

Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes

When designing a study to develop a new prediction model with binary or time‐to‐event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Riley, Richard D, Snell, Kym IE, Ensor, Joie, Burke, Danielle L, Harrell Jr, Frank E, Moons, Karel GM, Collins, Gary S
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2018
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6519266/ https://www.ncbi.nlm.nih.gov/pubmed/30357870 http://dx.doi.org/10.1002/sim.7992

_version_	1783418611354828800
author	Riley, Richard D Snell, Kym IE Ensor, Joie Burke, Danielle L Harrell Jr, Frank E Moons, Karel GM Collins, Gary S
author_facet	Riley, Richard D Snell, Kym IE Ensor, Joie Burke, Danielle L Harrell Jr, Frank E Moons, Karel GM Collins, Gary S
author_sort	Riley, Richard D
collection	PubMed
description	When designing a study to develop a new prediction model with binary or time‐to‐event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R(2), and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox‐Snell R(2), which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.
format	Online Article Text
id	pubmed-6519266
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-65192662019-05-23 Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes Riley, Richard D Snell, Kym IE Ensor, Joie Burke, Danielle L Harrell Jr, Frank E Moons, Karel GM Collins, Gary S Stat Med Research Articles When designing a study to develop a new prediction model with binary or time‐to‐event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R(2), and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox‐Snell R(2), which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required. John Wiley and Sons Inc. 2018-10-24 2019-03-30 /pmc/articles/PMC6519266/ /pubmed/30357870 http://dx.doi.org/10.1002/sim.7992 Text en © 2018 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Research Articles Riley, Richard D Snell, Kym IE Ensor, Joie Burke, Danielle L Harrell Jr, Frank E Moons, Karel GM Collins, Gary S Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes
title	Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes
title_full	Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes
title_fullStr	Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes
title_full_unstemmed	Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes
title_short	Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes
title_sort	minimum sample size for developing a multivariable prediction model: part ii ‐ binary and time‐to‐event outcomes
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6519266/ https://www.ncbi.nlm.nih.gov/pubmed/30357870 http://dx.doi.org/10.1002/sim.7992
work_keys_str_mv	AT rileyrichardd minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes AT snellkymie minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes AT ensorjoie minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes AT burkedaniellel minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes AT harrelljrfranke minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes AT moonskarelgm minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes AT collinsgarys minimumsamplesizefordevelopingamultivariablepredictionmodelpartiibinaryandtimetoeventoutcomes

Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes

Ejemplares similares