Cargando…

Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data

Least absolute shrinkage and selection operator (LASSO) regression is often applied to select the most promising set of single nucleotide polymorphisms (SNPs) associated with a molecular phenotype of interest. While the penalization parameter λ restricts the number of selected SNPs and the potential...

Descripción completa

Detalles Bibliográficos
Autores principales: Deutelmoser, Heike, Scherer, Dominique, Brenner, Hermann, Waldenberger, Melanie, Suhre, Karsten, Kastenmüller, Gabi, Lorenzo Bermejo, Justo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293825/
https://www.ncbi.nlm.nih.gov/pubmed/33063116
http://dx.doi.org/10.1093/bib/bbaa230
_version_ 1783725124858413056
author Deutelmoser, Heike
Scherer, Dominique
Brenner, Hermann
Waldenberger, Melanie
Suhre, Karsten
Kastenmüller, Gabi
Lorenzo Bermejo, Justo
author_facet Deutelmoser, Heike
Scherer, Dominique
Brenner, Hermann
Waldenberger, Melanie
Suhre, Karsten
Kastenmüller, Gabi
Lorenzo Bermejo, Justo
author_sort Deutelmoser, Heike
collection PubMed
description Least absolute shrinkage and selection operator (LASSO) regression is often applied to select the most promising set of single nucleotide polymorphisms (SNPs) associated with a molecular phenotype of interest. While the penalization parameter λ restricts the number of selected SNPs and the potential model overfitting, the least-squares loss function of standard LASSO regression translates into a strong dependence of statistical results on a small number of individuals with phenotypes or genotypes divergent from the majority of the study population—typically comprised of outliers and high-leverage observations. Robust methods have been developed to constrain the influence of divergent observations and generate statistical results that apply to the bulk of study data, but they have rarely been applied to genetic association studies. In this article, we review, for newcomers to the field of robust statistics, a novel version of standard LASSO that utilizes the Huber loss function. We conduct comprehensive simulations and analyze real protein, metabolite, mRNA expression and genotype data to compare the stability of penalization, the cross-iteration concordance of the model, the false-positive and true-positive rates and the prediction accuracy of standard and robust Huber-LASSO. Although the two methods showed controlled false-positive rates ≤2.1% and similar true-positive rates, robust Huber-LASSO outperformed standard LASSO in the accuracy of predicted protein, metabolite and gene expression levels using individual SNP data. The conducted simulations and real-data analyses show that robust Huber-LASSO represents a valuable alternative to standard LASSO in genetic studies of molecular phenotypes.
format Online
Article
Text
id pubmed-8293825
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82938252021-07-22 Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data Deutelmoser, Heike Scherer, Dominique Brenner, Hermann Waldenberger, Melanie Suhre, Karsten Kastenmüller, Gabi Lorenzo Bermejo, Justo Brief Bioinform Method Review Least absolute shrinkage and selection operator (LASSO) regression is often applied to select the most promising set of single nucleotide polymorphisms (SNPs) associated with a molecular phenotype of interest. While the penalization parameter λ restricts the number of selected SNPs and the potential model overfitting, the least-squares loss function of standard LASSO regression translates into a strong dependence of statistical results on a small number of individuals with phenotypes or genotypes divergent from the majority of the study population—typically comprised of outliers and high-leverage observations. Robust methods have been developed to constrain the influence of divergent observations and generate statistical results that apply to the bulk of study data, but they have rarely been applied to genetic association studies. In this article, we review, for newcomers to the field of robust statistics, a novel version of standard LASSO that utilizes the Huber loss function. We conduct comprehensive simulations and analyze real protein, metabolite, mRNA expression and genotype data to compare the stability of penalization, the cross-iteration concordance of the model, the false-positive and true-positive rates and the prediction accuracy of standard and robust Huber-LASSO. Although the two methods showed controlled false-positive rates ≤2.1% and similar true-positive rates, robust Huber-LASSO outperformed standard LASSO in the accuracy of predicted protein, metabolite and gene expression levels using individual SNP data. The conducted simulations and real-data analyses show that robust Huber-LASSO represents a valuable alternative to standard LASSO in genetic studies of molecular phenotypes. Oxford University Press 2020-10-16 /pmc/articles/PMC8293825/ /pubmed/33063116 http://dx.doi.org/10.1093/bib/bbaa230 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method Review
Deutelmoser, Heike
Scherer, Dominique
Brenner, Hermann
Waldenberger, Melanie
Suhre, Karsten
Kastenmüller, Gabi
Lorenzo Bermejo, Justo
Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
title Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
title_full Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
title_fullStr Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
title_full_unstemmed Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
title_short Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
title_sort robust huber-lasso for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data
topic Method Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293825/
https://www.ncbi.nlm.nih.gov/pubmed/33063116
http://dx.doi.org/10.1093/bib/bbaa230
work_keys_str_mv AT deutelmoserheike robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT schererdominique robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT brennerhermann robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT waldenbergermelanie robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT suhrekarsten robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT kastenmullergabi robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata
AT lorenzobermejojusto robusthuberlassoforimprovedpredictionofproteinmetaboliteandgeneexpressionlevelsrelyingonindividualgenotypedata