Cargando…

No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine...

Descripción completa

Detalles Bibliográficos
Autores principales: van Smeden, Maarten, de Groot, Joris A. H., Moons, Karel G. M., Collins, Gary S., Altman, Douglas G., Eijkemans, Marinus J. C., Reitsma, Johannes B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122171/
https://www.ncbi.nlm.nih.gov/pubmed/27881078
http://dx.doi.org/10.1186/s12874-016-0267-3
_version_ 1782469522919260160
author van Smeden, Maarten
de Groot, Joris A. H.
Moons, Karel G. M.
Collins, Gary S.
Altman, Douglas G.
Eijkemans, Marinus J. C.
Reitsma, Johannes B.
author_facet van Smeden, Maarten
de Groot, Joris A. H.
Moons, Karel G. M.
Collins, Gary S.
Altman, Douglas G.
Eijkemans, Marinus J. C.
Reitsma, Johannes B.
author_sort van Smeden, Maarten
collection PubMed
description BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. METHODS: The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. RESULTS: The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. CONCLUSIONS: The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
format Online
Article
Text
id pubmed-5122171
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51221712016-11-30 No rationale for 1 variable per 10 events criterion for binary logistic regression analysis van Smeden, Maarten de Groot, Joris A. H. Moons, Karel G. M. Collins, Gary S. Altman, Douglas G. Eijkemans, Marinus J. C. Reitsma, Johannes B. BMC Med Res Methodol Research Article BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. METHODS: The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. RESULTS: The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. CONCLUSIONS: The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis. BioMed Central 2016-11-24 /pmc/articles/PMC5122171/ /pubmed/27881078 http://dx.doi.org/10.1186/s12874-016-0267-3 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
van Smeden, Maarten
de Groot, Joris A. H.
Moons, Karel G. M.
Collins, Gary S.
Altman, Douglas G.
Eijkemans, Marinus J. C.
Reitsma, Johannes B.
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
title No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
title_full No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
title_fullStr No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
title_full_unstemmed No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
title_short No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
title_sort no rationale for 1 variable per 10 events criterion for binary logistic regression analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122171/
https://www.ncbi.nlm.nih.gov/pubmed/27881078
http://dx.doi.org/10.1186/s12874-016-0267-3
work_keys_str_mv AT vansmedenmaarten norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis
AT degrootjorisah norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis
AT moonskarelgm norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis
AT collinsgarys norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis
AT altmandouglasg norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis
AT eijkemansmarinusjc norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis
AT reitsmajohannesb norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis