Cargando…

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study

BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and r...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kammer, Michael, Dunkler, Daniela, Michiels, Stefan, Heinze, Georg
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316707/ https://www.ncbi.nlm.nih.gov/pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y

_version_	1784754881283751936
author	Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg
author_facet	Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg
author_sort	Kammer, Michael
collection	PubMed
description	BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. METHODS: We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. RESULTS: Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. CONCLUSIONS: Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01681-y.
format	Online Article Text
id	pubmed-9316707
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-93167072022-07-27 Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg BMC Med Res Methodol Research BACKGROUND: Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. METHODS: We compared proposals for selective inference targeting the submodel parameters of the Lasso and its extension, the adaptive Lasso: sample splitting, selective inference conditional on the Lasso selection (SI), and universally valid post-selection inference (PoSI). We studied the properties of the proposed selective confidence intervals available via R software packages using a neutral simulation study inspired by real data commonly seen in biomedical studies. Furthermore, we present an exemplary application of these methods to a publicly available dataset to discuss their practical usability. RESULTS: Frequentist properties of selective confidence intervals by the SI method were generally acceptable, but the claimed selective coverage levels were not attained in all scenarios, in particular with the adaptive Lasso. The actual coverage of the extremely conservative PoSI method exceeded the nominal levels, and this method also required the greatest computational effort. Sample splitting achieved acceptable actual selective coverage levels, but the method is inefficient and leads to less accurate point estimates. The choice of inference method had a large impact on the resulting interval estimates, thereby necessitating that the user is acutely aware of the goal of inference in order to interpret and communicate the results. CONCLUSIONS: Despite violating nominal coverage levels in some scenarios, selective inference conditional on the Lasso selection is our recommended approach for most cases. If simplicity is strongly favoured over efficiency, then sample splitting is an alternative. If only few predictors undergo variable selection (i.e. up to 5) or the avoidance of false positive claims of significance is a concern, then the conservative approach of PoSI may be useful. For the adaptive Lasso, SI should be avoided and only PoSI and sample splitting are recommended. In summary, we find selective inference useful to assess the uncertainties in the importance of individual selected predictors for future applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01681-y. BioMed Central 2022-07-26 /pmc/articles/PMC9316707/ /pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Kammer, Michael Dunkler, Daniela Michiels, Stefan Heinze, Georg Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title	Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_full	Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_fullStr	Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_full_unstemmed	Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_short	Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study
title_sort	evaluating methods for lasso selective inference in biomedical research: a comparative simulation study
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316707/ https://www.ncbi.nlm.nih.gov/pubmed/35883041 http://dx.doi.org/10.1186/s12874-022-01681-y
work_keys_str_mv	AT kammermichael evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy AT dunklerdaniela evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy AT michielsstefan evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy AT heinzegeorg evaluatingmethodsforlassoselectiveinferenceinbiomedicalresearchacomparativesimulationstudy

Evaluating methods for Lasso selective inference in biomedical research: a comparative simulation study

Ejemplares similares