Cargando…

Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data

MOTIVATION: Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suff...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Haohan, Lengerich, Benjamin J, Aragam, Bryon, Xing, Eric P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449749/
https://www.ncbi.nlm.nih.gov/pubmed/30184048
http://dx.doi.org/10.1093/bioinformatics/bty750
_version_ 1783408915772342272
author Wang, Haohan
Lengerich, Benjamin J
Aragam, Bryon
Xing, Eric P
author_facet Wang, Haohan
Lengerich, Benjamin J
Aragam, Bryon
Xing, Eric P
author_sort Wang, Haohan
collection PubMed
description MOTIVATION: Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection. RESULTS: To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression. AVAILABILITY AND IMPLEMENTATION: Software is available at https://github.com/HaohanWang/thePrecisionLasso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6449749
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64497492019-04-09 Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data Wang, Haohan Lengerich, Benjamin J Aragam, Bryon Xing, Eric P Bioinformatics Original Papers MOTIVATION: Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection. RESULTS: To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression. AVAILABILITY AND IMPLEMENTATION: Software is available at https://github.com/HaohanWang/thePrecisionLasso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-01 2018-09-01 /pmc/articles/PMC6449749/ /pubmed/30184048 http://dx.doi.org/10.1093/bioinformatics/bty750 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Wang, Haohan
Lengerich, Benjamin J
Aragam, Bryon
Xing, Eric P
Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
title Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
title_full Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
title_fullStr Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
title_full_unstemmed Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
title_short Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
title_sort precision lasso: accounting for correlations and linear dependencies in high-dimensional genomic data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449749/
https://www.ncbi.nlm.nih.gov/pubmed/30184048
http://dx.doi.org/10.1093/bioinformatics/bty750
work_keys_str_mv AT wanghaohan precisionlassoaccountingforcorrelationsandlineardependenciesinhighdimensionalgenomicdata
AT lengerichbenjaminj precisionlassoaccountingforcorrelationsandlineardependenciesinhighdimensionalgenomicdata
AT aragambryon precisionlassoaccountingforcorrelationsandlineardependenciesinhighdimensionalgenomicdata
AT xingericp precisionlassoaccountingforcorrelationsandlineardependenciesinhighdimensionalgenomicdata