Cargando…

A simulation study for evaluating the performance of clustering measures in multilevel logistic regression

BACKGROUND: Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogenei...

Descripción completa

Detalles Bibliográficos
Autores principales: Adam, Nicholas Siame, Twabi, Halima S., Manda, Samuel O.M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590272/
https://www.ncbi.nlm.nih.gov/pubmed/34772354
http://dx.doi.org/10.1186/s12874-021-01417-4
_version_ 1784598922037035008
author Adam, Nicholas Siame
Twabi, Halima S.
Manda, Samuel O.M.
author_facet Adam, Nicholas Siame
Twabi, Halima S.
Manda, Samuel O.M.
author_sort Adam, Nicholas Siame
collection PubMed
description BACKGROUND: Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)). METHODS: We used several simulation datasets of a two-level logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi. RESULTS: Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively. CONCLUSION: At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster correlation coefficient would suffice in multilevel logistic regression analyses.
format Online
Article
Text
id pubmed-8590272
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85902722021-11-15 A simulation study for evaluating the performance of clustering measures in multilevel logistic regression Adam, Nicholas Siame Twabi, Halima S. Manda, Samuel O.M. BMC Med Res Methodol Research BACKGROUND: Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)). METHODS: We used several simulation datasets of a two-level logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi. RESULTS: Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively. CONCLUSION: At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster correlation coefficient would suffice in multilevel logistic regression analyses. BioMed Central 2021-11-13 /pmc/articles/PMC8590272/ /pubmed/34772354 http://dx.doi.org/10.1186/s12874-021-01417-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Adam, Nicholas Siame
Twabi, Halima S.
Manda, Samuel O.M.
A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
title A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
title_full A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
title_fullStr A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
title_full_unstemmed A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
title_short A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
title_sort simulation study for evaluating the performance of clustering measures in multilevel logistic regression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590272/
https://www.ncbi.nlm.nih.gov/pubmed/34772354
http://dx.doi.org/10.1186/s12874-021-01417-4
work_keys_str_mv AT adamnicholassiame asimulationstudyforevaluatingtheperformanceofclusteringmeasuresinmultilevellogisticregression
AT twabihalimas asimulationstudyforevaluatingtheperformanceofclusteringmeasuresinmultilevellogisticregression
AT mandasamuelom asimulationstudyforevaluatingtheperformanceofclusteringmeasuresinmultilevellogisticregression
AT adamnicholassiame simulationstudyforevaluatingtheperformanceofclusteringmeasuresinmultilevellogisticregression
AT twabihalimas simulationstudyforevaluatingtheperformanceofclusteringmeasuresinmultilevellogisticregression
AT mandasamuelom simulationstudyforevaluatingtheperformanceofclusteringmeasuresinmultilevellogisticregression