Cargando…
Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and r...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316707/ https://www.ncbi.nlm.nih.gov/pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y |
_version_ | 1784754881283751936 |
---|---|
author | Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg |
author_facet | Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg |
author_sort | Kammer, Michael |
collection | PubMed |
description | BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. METHODS: We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. RESULTS: Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. CONCLUSIONS: Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01681-y. |
format | Online Article Text |
id | pubmed-9316707 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-93167072022-07-27 Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg BMC Med Res Methodol Research BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. METHODS: We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. RESULTS: Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. CONCLUSIONS: Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01681-y. BioMed Central 2022-07-26 /pmc/articles/PMC9316707/ /pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study |
title | Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study |
title_full | Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study |
title_fullStr | Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study |
title_full_unstemmed | Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study |
title_short | Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study |
title_sort | evaluating methods for lasso selective inference in biomedical research: a comparative simulation study |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316707/ https://www.ncbi.nlm.nih.gov/pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y |
work_keys_str_mv | AT kammermichael evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy AT dunklerdaniela evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy AT michielsstefan evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy AT heinzegeorg evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy |