Cargando…

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wallisch, Christine, Dunkler, Daniela, Rauch, Geraldine, de Bin, Riccardo, Heinze, Georg
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley & Sons, Inc. 2020
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820988/ https://www.ncbi.nlm.nih.gov/pubmed/33089538 http://dx.doi.org/10.1002/sim.8779

_version_	1783639327286231040
author	Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg
author_facet	Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg
author_sort	Wallisch, Christine
collection	PubMed
description	Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it.
format	Online Article Text
id	pubmed-7820988
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	John Wiley & Sons, Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-78209882021-01-26 Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg Stat Med Research Articles Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it. John Wiley & Sons, Inc. 2020-10-21 2021-01-30 /pmc/articles/PMC7820988/ /pubmed/33089538 http://dx.doi.org/10.1002/sim.8779 Text en © 2020 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Articles Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title	Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_full	Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_fullStr	Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_full_unstemmed	Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_short	Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_sort	selection of variables for multivariable models: opportunities and limitations in quantifying model stability by resampling
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820988/ https://www.ncbi.nlm.nih.gov/pubmed/33089538 http://dx.doi.org/10.1002/sim.8779
work_keys_str_mv	AT wallischchristine selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT dunklerdaniela selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT rauchgeraldine selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT debinriccardo selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT heinzegeorg selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

Ejemplares similares