Cargando…

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study

BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and r...

Descripción completa

Detalles Bibliográficos
Autores principales: Kammer, Michael, Dunkler, Daniela, Michiels, Stefan, Heinze, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316707/
https://www.ncbi.nlm.nih.gov/pubmed/35883041
http://dx.doi.org/10.1186/s12874-022-01681-y
_version_ 1784754881283751936
author Kammer, Michael
Dunkler, Daniela
Michiels, Stefan
Heinze, Georg
author_facet Kammer, Michael
Dunkler, Daniela
Michiels, Stefan
Heinze, Georg
author_sort Kammer, Michael
collection PubMed
description BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. METHODS: We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. RESULTS: Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. CONCLUSIONS: Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01681-y.
format Online
Article
Text
id pubmed-9316707
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93167072022-07-27 Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg BMC Med Res Methodol Research BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. METHODS: We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. RESULTS: Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. CONCLUSIONS: Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01681-y. BioMed Central 2022-07-26 /pmc/articles/PMC9316707/ /pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kammer, Michael
Dunkler, Daniela
Michiels, Stefan
Heinze, Georg
Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_full Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_fullStr Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_full_unstemmed Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_short Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_sort evaluating methods for lasso selective inference in biomedical research: a comparative simulation study
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316707/
https://www.ncbi.nlm.nih.gov/pubmed/35883041
http://dx.doi.org/10.1186/s12874-022-01681-y
work_keys_str_mv AT kammermichael evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy
AT dunklerdaniela evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy
AT michielsstefan evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy
AT heinzegeorg evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy