Cargando…
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122171/ https://www.ncbi.nlm.nih.gov/pubmed/27881078 http://dx.doi.org/10.1186/s12874-016-0267-3 |
_version_ | 1782469522919260160 |
---|---|
author | van Smeden, Maarten de Groot, Joris A. H. Moons, Karel G. M. Collins, Gary S. Altman, Douglas G. Eijkemans, Marinus J. C. Reitsma, Johannes B. |
author_facet | van Smeden, Maarten de Groot, Joris A. H. Moons, Karel G. M. Collins, Gary S. Altman, Douglas G. Eijkemans, Marinus J. C. Reitsma, Johannes B. |
author_sort | van Smeden, Maarten |
collection | PubMed |
description | BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. METHODS: The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. RESULTS: The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. CONCLUSIONS: The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis. |
format | Online Article Text |
id | pubmed-5122171 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51221712016-11-30 No rationale for 1 variable per 10 events criterion for binary logistic regression analysis van Smeden, Maarten de Groot, Joris A. H. Moons, Karel G. M. Collins, Gary S. Altman, Douglas G. Eijkemans, Marinus J. C. Reitsma, Johannes B. BMC Med Res Methodol Research Article BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. METHODS: The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. RESULTS: The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. CONCLUSIONS: The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis. BioMed Central 2016-11-24 /pmc/articles/PMC5122171/ /pubmed/27881078 http://dx.doi.org/10.1186/s12874-016-0267-3 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article van Smeden, Maarten de Groot, Joris A. H. Moons, Karel G. M. Collins, Gary S. Altman, Douglas G. Eijkemans, Marinus J. C. Reitsma, Johannes B. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
title | No rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
title_full | No rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
title_fullStr | No rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
title_full_unstemmed | No rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
title_short | No rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
title_sort | no rationale for 1 variable per 10 events criterion for binary logistic regression analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122171/ https://www.ncbi.nlm.nih.gov/pubmed/27881078 http://dx.doi.org/10.1186/s12874-016-0267-3 |
work_keys_str_mv | AT vansmedenmaarten norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis AT degrootjorisah norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis AT moonskarelgm norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis AT collinsgarys norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis AT altmandouglasg norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis AT eijkemansmarinusjc norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis AT reitsmajohannesb norationalefor1variableper10eventscriterionforbinarylogisticregressionanalysis |