Cargando…

Sample size for binary logistic prediction models: Beyond events per variable criteria

Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of...

Descripción completa

Detalles Bibliográficos
Autores principales: van Smeden, Maarten, Moons, Karel GM, de Groot, Joris AH, Collins, Gary S, Altman, Douglas G, Eijkemans, Marinus JC, Reitsma, Johannes B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6710621/
https://www.ncbi.nlm.nih.gov/pubmed/29966490
http://dx.doi.org/10.1177/0962280218784726
_version_ 1783446377451225088
author van Smeden, Maarten
Moons, Karel GM
de Groot, Joris AH
Collins, Gary S
Altman, Douglas G
Eijkemans, Marinus JC
Reitsma, Johannes B
author_facet van Smeden, Maarten
Moons, Karel GM
de Groot, Joris AH
Collins, Gary S
Altman, Douglas G
Eijkemans, Marinus JC
Reitsma, Johannes B
author_sort van Smeden, Maarten
collection PubMed
description Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.
format Online
Article
Text
id pubmed-6710621
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-67106212019-09-17 Sample size for binary logistic prediction models: Beyond events per variable criteria van Smeden, Maarten Moons, Karel GM de Groot, Joris AH Collins, Gary S Altman, Douglas G Eijkemans, Marinus JC Reitsma, Johannes B Stat Methods Med Res Articles Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination. SAGE Publications 2018-07-03 2019-08 /pmc/articles/PMC6710621/ /pubmed/29966490 http://dx.doi.org/10.1177/0962280218784726 Text en © The Author(s) 2018 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Articles
van Smeden, Maarten
Moons, Karel GM
de Groot, Joris AH
Collins, Gary S
Altman, Douglas G
Eijkemans, Marinus JC
Reitsma, Johannes B
Sample size for binary logistic prediction models: Beyond events per variable criteria
title Sample size for binary logistic prediction models: Beyond events per variable criteria
title_full Sample size for binary logistic prediction models: Beyond events per variable criteria
title_fullStr Sample size for binary logistic prediction models: Beyond events per variable criteria
title_full_unstemmed Sample size for binary logistic prediction models: Beyond events per variable criteria
title_short Sample size for binary logistic prediction models: Beyond events per variable criteria
title_sort sample size for binary logistic prediction models: beyond events per variable criteria
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6710621/
https://www.ncbi.nlm.nih.gov/pubmed/29966490
http://dx.doi.org/10.1177/0962280218784726
work_keys_str_mv AT vansmedenmaarten samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria
AT moonskarelgm samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria
AT degrootjorisah samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria
AT collinsgarys samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria
AT altmandouglasg samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria
AT eijkemansmarinusjc samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria
AT reitsmajohannesb samplesizeforbinarylogisticpredictionmodelsbeyondeventspervariablecriteria