Cargando…

Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling

Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice...

Descripción completa

Detalles Bibliográficos
Autores principales: Wallisch, Christine, Dunkler, Daniela, Rauch, Geraldine, de Bin, Riccardo, Heinze, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820988/
https://www.ncbi.nlm.nih.gov/pubmed/33089538
http://dx.doi.org/10.1002/sim.8779
_version_ 1783639327286231040
author Wallisch, Christine
Dunkler, Daniela
Rauch, Geraldine
de Bin, Riccardo
Heinze, Georg
author_facet Wallisch, Christine
Dunkler, Daniela
Rauch, Geraldine
de Bin, Riccardo
Heinze, Georg
author_sort Wallisch, Christine
collection PubMed
description Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it.
format Online
Article
Text
id pubmed-7820988
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-78209882021-01-26 Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg Stat Med Research Articles Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it. John Wiley & Sons, Inc. 2020-10-21 2021-01-30 /pmc/articles/PMC7820988/ /pubmed/33089538 http://dx.doi.org/10.1002/sim.8779 Text en © 2020 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Wallisch, Christine
Dunkler, Daniela
Rauch, Geraldine
de Bin, Riccardo
Heinze, Georg
Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_full Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_fullStr Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_full_unstemmed Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_short Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
title_sort selection of variables for multivariable models: opportunities and limitations in quantifying model stability by resampling
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820988/
https://www.ncbi.nlm.nih.gov/pubmed/33089538
http://dx.doi.org/10.1002/sim.8779
work_keys_str_mv AT wallischchristine selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling
AT dunklerdaniela selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling
AT rauchgeraldine selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling
AT debinriccardo selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling
AT heinzegeorg selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling