Cargando…

Adequate sample size for developing prediction models is not simply related to events per variable

OBJECTIVES: The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a rang...

Descripción completa

Detalles Bibliográficos
Autores principales: Ogundimu, Emmanuel O., Altman, Douglas G., Collins, Gary S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045274/
https://www.ncbi.nlm.nih.gov/pubmed/26964707
http://dx.doi.org/10.1016/j.jclinepi.2016.02.031
_version_ 1782457090532442112
author Ogundimu, Emmanuel O.
Altman, Douglas G.
Collins, Gary S.
author_facet Ogundimu, Emmanuel O.
Altman, Douglas G.
Collins, Gary S.
author_sort Ogundimu, Emmanuel O.
collection PubMed
description OBJECTIVES: The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated. STUDY DESIGN AND SETTING: We conducted an extended resampling study using a large general-practice data set, comprising over 2 million anonymized patient records, to examine the EPV requirements for prediction models with low-prevalence binary predictors developed using Cox regression. The performance of the models was then evaluated using an independent external validation data set. We investigated both fully specified models and models derived using variable selection. RESULTS: Our results indicated that an EPV rule of thumb should be data driven and that EPV ≥ 20 ​ generally eliminates bias in regression coefficients when many low-prevalence predictors are included in a Cox model. CONCLUSION: Higher EPV is needed when low-prevalence predictors are present in a model to eliminate bias in regression coefficients and improve predictive accuracy.
format Online
Article
Text
id pubmed-5045274
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-50452742016-10-05 Adequate sample size for developing prediction models is not simply related to events per variable Ogundimu, Emmanuel O. Altman, Douglas G. Collins, Gary S. J Clin Epidemiol Original Article OBJECTIVES: The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated. STUDY DESIGN AND SETTING: We conducted an extended resampling study using a large general-practice data set, comprising over 2 million anonymized patient records, to examine the EPV requirements for prediction models with low-prevalence binary predictors developed using Cox regression. The performance of the models was then evaluated using an independent external validation data set. We investigated both fully specified models and models derived using variable selection. RESULTS: Our results indicated that an EPV rule of thumb should be data driven and that EPV ≥ 20 ​ generally eliminates bias in regression coefficients when many low-prevalence predictors are included in a Cox model. CONCLUSION: Higher EPV is needed when low-prevalence predictors are present in a model to eliminate bias in regression coefficients and improve predictive accuracy. Elsevier 2016-08 /pmc/articles/PMC5045274/ /pubmed/26964707 http://dx.doi.org/10.1016/j.jclinepi.2016.02.031 Text en © 2016 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Original Article
Ogundimu, Emmanuel O.
Altman, Douglas G.
Collins, Gary S.
Adequate sample size for developing prediction models is not simply related to events per variable
title Adequate sample size for developing prediction models is not simply related to events per variable
title_full Adequate sample size for developing prediction models is not simply related to events per variable
title_fullStr Adequate sample size for developing prediction models is not simply related to events per variable
title_full_unstemmed Adequate sample size for developing prediction models is not simply related to events per variable
title_short Adequate sample size for developing prediction models is not simply related to events per variable
title_sort adequate sample size for developing prediction models is not simply related to events per variable
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5045274/
https://www.ncbi.nlm.nih.gov/pubmed/26964707
http://dx.doi.org/10.1016/j.jclinepi.2016.02.031
work_keys_str_mv AT ogundimuemmanuelo adequatesamplesizefordevelopingpredictionmodelsisnotsimplyrelatedtoeventspervariable
AT altmandouglasg adequatesamplesizefordevelopingpredictionmodelsisnotsimplyrelatedtoeventspervariable
AT collinsgarys adequatesamplesizefordevelopingpredictionmodelsisnotsimplyrelatedtoeventspervariable