Cargando…

Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size

BACKGROUND: Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. METHODS: Confounding by Cluster (CBC...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavlou, Menelaos, Ambler, Gareth, Omar, Rumana Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8254921/
https://www.ncbi.nlm.nih.gov/pubmed/34218793
http://dx.doi.org/10.1186/s12874-021-01321-x
_version_ 1783717802440392704
author Pavlou, Menelaos
Ambler, Gareth
Omar, Rumana Z.
author_facet Pavlou, Menelaos
Ambler, Gareth
Omar, Rumana Z.
author_sort Pavlou, Menelaos
collection PubMed
description BACKGROUND: Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. METHODS: Confounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions. RESULTS: Both CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC. CONCLUSION: Ignoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01321-x.
format Online
Article
Text
id pubmed-8254921
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82549212021-07-06 Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size Pavlou, Menelaos Ambler, Gareth Omar, Rumana Z. BMC Med Res Methodol Research Article BACKGROUND: Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. METHODS: Confounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions. RESULTS: Both CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC. CONCLUSION: Ignoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01321-x. BioMed Central 2021-07-04 /pmc/articles/PMC8254921/ /pubmed/34218793 http://dx.doi.org/10.1186/s12874-021-01321-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Pavlou, Menelaos
Ambler, Gareth
Omar, Rumana Z.
Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
title Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
title_full Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
title_fullStr Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
title_full_unstemmed Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
title_short Risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
title_sort risk prediction in multicentre studies when there is confounding by cluster or informative cluster size
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8254921/
https://www.ncbi.nlm.nih.gov/pubmed/34218793
http://dx.doi.org/10.1186/s12874-021-01321-x
work_keys_str_mv AT pavloumenelaos riskpredictioninmulticentrestudieswhenthereisconfoundingbyclusterorinformativeclustersize
AT amblergareth riskpredictioninmulticentrestudieswhenthereisconfoundingbyclusterorinformativeclustersize
AT omarrumanaz riskpredictioninmulticentrestudieswhenthereisconfoundingbyclusterorinformativeclustersize