Cargando…

An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes

BACKGROUND: In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates...

Descripción completa

Detalles Bibliográficos
Autores principales:	Geroldinger, Angelika, Blagus, Rok, Ogden, Helen, Heinze, Georg
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178839/ https://www.ncbi.nlm.nih.gov/pubmed/35681120 http://dx.doi.org/10.1186/s12874-022-01641-6

_version_	1784723143853604864
author	Geroldinger, Angelika Blagus, Rok Ogden, Helen Heinze, Georg
author_facet	Geroldinger, Angelika Blagus, Rok Ogden, Helen Heinze, Georg
author_sort	Geroldinger, Angelika
collection	PubMed
description	BACKGROUND: In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regression (FL), which was originally proposed to reduce the bias in coefficient estimates. The question of convergence becomes more involved when analyzing clustered data as frequently encountered in clinical research, e.g. data collected in several study centers or when individuals contribute multiple observations, using marginal logistic regression models fitted by generalized estimating equations (GEE). From our experience we suspect that separable data are a sufficient, but not a necessary condition for non-convergence of GEE. Thus, we expect that generalizations of approaches that can handle separable uncorrelated data may reduce but not fully remove the non-convergence issues of GEE. METHODS: We investigate one recently proposed and two new extensions of FL to GEE. With ‘penalized GEE’ the GEE are treated as score equations, i.e. as derivatives of a log-likelihood set to zero, which are then modified as in FL. We introduce two approaches motivated by the equivalence of FL and maximum likelihood estimation with iteratively augmented data. Specifically, we consider fully iterated and single-step versions of this ‘augmented GEE’ approach. We compare the three approaches with respect to convergence behavior, practical applicability and performance using simulated data and a real data example. RESULTS: Our simulations indicate that all three extensions of FL to GEE substantially improve convergence compared to ordinary GEE, while showing a similar or even better performance in terms of accuracy of coefficient estimates and predictions. Penalized GEE often slightly outperforms the augmented GEE approaches, but this comes at the cost of a higher burden of implementation. CONCLUSIONS: When fitting marginal logistic regression models using GEE on sparse data we recommend to apply penalized GEE if one has access to a suitable software implementation and single-step augmented GEE otherwise. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01641-6.
format	Online Article Text
id	pubmed-9178839
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-91788392022-06-10 An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes Geroldinger, Angelika Blagus, Rok Ogden, Helen Heinze, Georg BMC Med Res Methodol Research BACKGROUND: In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regression (FL), which was originally proposed to reduce the bias in coefficient estimates. The question of convergence becomes more involved when analyzing clustered data as frequently encountered in clinical research, e.g. data collected in several study centers or when individuals contribute multiple observations, using marginal logistic regression models fitted by generalized estimating equations (GEE). From our experience we suspect that separable data are a sufficient, but not a necessary condition for non-convergence of GEE. Thus, we expect that generalizations of approaches that can handle separable uncorrelated data may reduce but not fully remove the non-convergence issues of GEE. METHODS: We investigate one recently proposed and two new extensions of FL to GEE. With ‘penalized GEE’ the GEE are treated as score equations, i.e. as derivatives of a log-likelihood set to zero, which are then modified as in FL. We introduce two approaches motivated by the equivalence of FL and maximum likelihood estimation with iteratively augmented data. Specifically, we consider fully iterated and single-step versions of this ‘augmented GEE’ approach. We compare the three approaches with respect to convergence behavior, practical applicability and performance using simulated data and a real data example. RESULTS: Our simulations indicate that all three extensions of FL to GEE substantially improve convergence compared to ordinary GEE, while showing a similar or even better performance in terms of accuracy of coefficient estimates and predictions. Penalized GEE often slightly outperforms the augmented GEE approaches, but this comes at the cost of a higher burden of implementation. CONCLUSIONS: When fitting marginal logistic regression models using GEE on sparse data we recommend to apply penalized GEE if one has access to a suitable software implementation and single-step augmented GEE otherwise. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01641-6. BioMed Central 2022-06-09 /pmc/articles/PMC9178839/ /pubmed/35681120 http://dx.doi.org/10.1186/s12874-022-01641-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Geroldinger, Angelika Blagus, Rok Ogden, Helen Heinze, Georg An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
title	An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
title_full	An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
title_fullStr	An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
title_full_unstemmed	An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
title_short	An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
title_sort	investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178839/ https://www.ncbi.nlm.nih.gov/pubmed/35681120 http://dx.doi.org/10.1186/s12874-022-01641-6
work_keys_str_mv	AT geroldingerangelika aninvestigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT blagusrok aninvestigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT ogdenhelen aninvestigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT heinzegeorg aninvestigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT geroldingerangelika investigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT blagusrok investigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT ogdenhelen investigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes AT heinzegeorg investigationofpenalizationanddataaugmentationtoimproveconvergenceofgeneralizedestimatingequationsforclusteredbinaryoutcomes

An investigation of penalization and data augmentation to improve convergence of generalized estimating equations for clustered binary outcomes

Ejemplares similares