Cargando…

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes

INTRODUCTION: Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but iden...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wolf, Bethany J., Jiang, Yunyun, Wilson, Sylvia H., Oates, Jim C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cambridge University Press 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057419/ https://www.ncbi.nlm.nih.gov/pubmed/33948279 http://dx.doi.org/10.1017/cts.2020.556

_version_	1783680831649218560
author	Wolf, Bethany J. Jiang, Yunyun Wilson, Sylvia H. Oates, Jim C.
author_facet	Wolf, Bethany J. Jiang, Yunyun Wilson, Sylvia H. Oates, Jim C.
author_sort	Wolf, Bethany J.
collection	PubMed
description	INTRODUCTION: Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized a priori. We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance for choosing between the methods. METHODS: We conducted simulations comparing stepwise selection, penalized GLMM, boosted GLMM, and boosted GEE for variable selection considering main effects and two-way interactions in data with repeatedly measured binary outcomes and evaluate a two-stage approach to reduce bias and error in parameter estimates. We compared these approaches in real data applications: hypothermia during surgery and treatment response in lupus nephritis. RESULTS: Penalized and boosted approaches recovered correct predictors and interactions more frequently than stepwise selection. Penalized GLMM recovered correct predictors more often than boosting, but included many spurious predictors. Boosted GLMM yielded parsimonious models and identified correct predictors well at large sample and effect sizes, but required excessive computation time. Boosted GEE was computationally efficient and selected relatively parsimonious models, offering a compromise between computation and parsimony. The two-stage approach reduced the bias and error in regression parameters in all approaches. CONCLUSION: Penalized and boosted approaches are effective for variable selection in data with clustered binary outcomes. The two-stage approach reduces bias and error and should be applied regardless of method. We provide guidance for choosing the most appropriate method in real applications.
format	Online Article Text
id	pubmed-8057419
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Cambridge University Press
record_format	MEDLINE/PubMed
spelling	pubmed-80574192021-05-03 Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes Wolf, Bethany J. Jiang, Yunyun Wilson, Sylvia H. Oates, Jim C. J Clin Transl Sci Research Article INTRODUCTION: Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized a priori. We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance for choosing between the methods. METHODS: We conducted simulations comparing stepwise selection, penalized GLMM, boosted GLMM, and boosted GEE for variable selection considering main effects and two-way interactions in data with repeatedly measured binary outcomes and evaluate a two-stage approach to reduce bias and error in parameter estimates. We compared these approaches in real data applications: hypothermia during surgery and treatment response in lupus nephritis. RESULTS: Penalized and boosted approaches recovered correct predictors and interactions more frequently than stepwise selection. Penalized GLMM recovered correct predictors more often than boosting, but included many spurious predictors. Boosted GLMM yielded parsimonious models and identified correct predictors well at large sample and effect sizes, but required excessive computation time. Boosted GEE was computationally efficient and selected relatively parsimonious models, offering a compromise between computation and parsimony. The two-stage approach reduced the bias and error in regression parameters in all approaches. CONCLUSION: Penalized and boosted approaches are effective for variable selection in data with clustered binary outcomes. The two-stage approach reduces bias and error and should be applied regardless of method. We provide guidance for choosing the most appropriate method in real applications. Cambridge University Press 2020-11-16 /pmc/articles/PMC8057419/ /pubmed/33948279 http://dx.doi.org/10.1017/cts.2020.556 Text en © The Association for Clinical and Translational Science 2020 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Wolf, Bethany J. Jiang, Yunyun Wilson, Sylvia H. Oates, Jim C. Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
title	Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
title_full	Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
title_fullStr	Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
title_full_unstemmed	Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
title_short	Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
title_sort	variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057419/ https://www.ncbi.nlm.nih.gov/pubmed/33948279 http://dx.doi.org/10.1017/cts.2020.556
work_keys_str_mv	AT wolfbethanyj variableselectionmethodsforidentifyingpredictorinteractionsindatawithrepeatedlymeasuredbinaryoutcomes AT jiangyunyun variableselectionmethodsforidentifyingpredictorinteractionsindatawithrepeatedlymeasuredbinaryoutcomes AT wilsonsylviah variableselectionmethodsforidentifyingpredictorinteractionsindatawithrepeatedlymeasuredbinaryoutcomes AT oatesjimc variableselectionmethodsforidentifyingpredictorinteractionsindatawithrepeatedlymeasuredbinaryoutcomes

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes

Ejemplares similares