Cargando…
Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley & Sons, Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820988/ https://www.ncbi.nlm.nih.gov/pubmed/33089538 http://dx.doi.org/10.1002/sim.8779 |
_version_ | 1783639327286231040 |
---|---|
author | Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg |
author_facet | Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg |
author_sort | Wallisch, Christine |
collection | PubMed |
description | Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it. |
format | Online Article Text |
id | pubmed-7820988 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | John Wiley & Sons, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78209882021-01-26 Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg Stat Med Research Articles Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection is rarely certain. However, these issues are often ignored and model stability is not questioned. Several resampling‐based measures were proposed to describe model stability, including variable inclusion frequencies (VIFs), model selection frequencies, relative conditional bias (RCB), and root mean squared difference ratio (RMSDR). The latter two were recently proposed to assess bias and variance inflation induced by variable selection. Here, we study the consistency and accuracy of resampling estimates of these measures and the optimal choice of the resampling technique. In particular, we compare subsampling and bootstrapping for assessing stability of linear, logistic, and Cox models obtained by backward elimination in a simulation study. Moreover, we exemplify the estimation and interpretation of all suggested measures in a study on cardiovascular risk. The VIF and the model selection frequency are only consistently estimated in the subsampling approach. By contrast, the bootstrap is advantageous in terms of bias and precision for estimating the RCB as well as the RMSDR. Though, unbiased estimation of the latter quantity requires independence of covariates, which is rarely encountered in practice. Our study stresses the importance of addressing model stability after variable selection and shows how to cope with it. John Wiley & Sons, Inc. 2020-10-21 2021-01-30 /pmc/articles/PMC7820988/ /pubmed/33089538 http://dx.doi.org/10.1002/sim.8779 Text en © 2020 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Articles Wallisch, Christine Dunkler, Daniela Rauch, Geraldine de Bin, Riccardo Heinze, Georg Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling |
title | Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling |
title_full | Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling |
title_fullStr | Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling |
title_full_unstemmed | Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling |
title_short | Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling |
title_sort | selection of variables for multivariable models: opportunities and limitations in quantifying model stability by resampling |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820988/ https://www.ncbi.nlm.nih.gov/pubmed/33089538 http://dx.doi.org/10.1002/sim.8779 |
work_keys_str_mv | AT wallischchristine selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT dunklerdaniela selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT rauchgeraldine selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT debinriccardo selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling AT heinzegeorg selectionofvariablesformultivariablemodelsopportunitiesandlimitationsinquantifyingmodelstabilitybyresampling |