Cargando…
The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach
BACKGROUND: Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distributi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6507150/ https://www.ncbi.nlm.nih.gov/pubmed/31072299 http://dx.doi.org/10.1186/s12874-019-0742-8 |
_version_ | 1783416971112480768 |
---|---|
author | Olvera Astivia, Oscar L. Gadermann, Anne Guhn, Martin |
author_facet | Olvera Astivia, Oscar L. Gadermann, Anne Guhn, Martin |
author_sort | Olvera Astivia, Oscar L. |
collection | PubMed |
description | BACKGROUND: Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distribution of the predictors can play a role in the power to estimate these effects. To address both matters, we present a sample of cases documenting the influence that predictor distribution have on statistical power as well as a user-friendly, web-based application to conduct power analysis for multilevel logistic regression. METHOD: Computer simulations are implemented to estimate statistical power in multilevel logistic regression with varying numbers of clusters, varying cluster sample sizes, and non-normal and non-symmetrical distributions of the Level 1/2 predictors. Power curves were simulated to see in what ways non-normal/unbalanced distributions of a binary predictor and a continuous predictor affect the detection of population effect sizes for main effects, a cross-level interaction and the variance of the random effects. RESULTS: Skewed continuous predictors and unbalanced binary ones require larger sample sizes at both levels than balanced binary predictors and normally-distributed continuous ones. In the most extreme case of imbalance (10% incidence) and skewness of a chi-square distribution with 1 degree of freedom, even 110 Level 2 units and 100 Level 1 units were not sufficient for all predictors to reach power of 80%, mostly hovering at around 50% with the exception of the skewed, continuous Level 2 predictor. CONCLUSIONS: Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced. The more skewed or imbalanced the predictor is, the larger the sample size requirements. To assist researchers in planning research studies, a user-friendly web application that conducts power analysis via computer simulations in the R programming language is provided. With this web application, users can conduct simulations, tailored to their study design, to estimate statistical power for multilevel logistic regression models. |
format | Online Article Text |
id | pubmed-6507150 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65071502019-05-13 The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach Olvera Astivia, Oscar L. Gadermann, Anne Guhn, Martin BMC Med Res Methodol Research Article BACKGROUND: Despite its popularity, issues concerning the estimation of power in multilevel logistic regression models are prevalent because of the complexity involved in its calculation (i.e., computer-simulation-based approaches). These issues are further compounded by the fact that the distribution of the predictors can play a role in the power to estimate these effects. To address both matters, we present a sample of cases documenting the influence that predictor distribution have on statistical power as well as a user-friendly, web-based application to conduct power analysis for multilevel logistic regression. METHOD: Computer simulations are implemented to estimate statistical power in multilevel logistic regression with varying numbers of clusters, varying cluster sample sizes, and non-normal and non-symmetrical distributions of the Level 1/2 predictors. Power curves were simulated to see in what ways non-normal/unbalanced distributions of a binary predictor and a continuous predictor affect the detection of population effect sizes for main effects, a cross-level interaction and the variance of the random effects. RESULTS: Skewed continuous predictors and unbalanced binary ones require larger sample sizes at both levels than balanced binary predictors and normally-distributed continuous ones. In the most extreme case of imbalance (10% incidence) and skewness of a chi-square distribution with 1 degree of freedom, even 110 Level 2 units and 100 Level 1 units were not sufficient for all predictors to reach power of 80%, mostly hovering at around 50% with the exception of the skewed, continuous Level 2 predictor. CONCLUSIONS: Given the complex interactive influence among sample sizes, effect sizes and predictor distribution characteristics, it seems unwarranted to make generic rule-of-thumb sample size recommendations for multilevel logistic regression, aside from the fact that larger sample sizes are required when the distributions of the predictors are not symmetric or balanced. The more skewed or imbalanced the predictor is, the larger the sample size requirements. To assist researchers in planning research studies, a user-friendly web application that conducts power analysis via computer simulations in the R programming language is provided. With this web application, users can conduct simulations, tailored to their study design, to estimate statistical power for multilevel logistic regression models. BioMed Central 2019-05-09 /pmc/articles/PMC6507150/ /pubmed/31072299 http://dx.doi.org/10.1186/s12874-019-0742-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Olvera Astivia, Oscar L. Gadermann, Anne Guhn, Martin The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
title | The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
title_full | The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
title_fullStr | The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
title_full_unstemmed | The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
title_short | The relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
title_sort | relationship between statistical power and predictor distribution in multilevel logistic regression: a simulation-based approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6507150/ https://www.ncbi.nlm.nih.gov/pubmed/31072299 http://dx.doi.org/10.1186/s12874-019-0742-8 |
work_keys_str_mv | AT olveraastiviaoscarl therelationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach AT gadermannanne therelationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach AT guhnmartin therelationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach AT olveraastiviaoscarl relationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach AT gadermannanne relationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach AT guhnmartin relationshipbetweenstatisticalpowerandpredictordistributioninmultilevellogisticregressionasimulationbasedapproach |