Cargando…

A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits

A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual di...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gianola, Daniel, Fernando, Rohan L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Genetics Society of America 2020
Materias:	Investigations
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017027/ https://www.ncbi.nlm.nih.gov/pubmed/31879318 http://dx.doi.org/10.1534/genetics.119.302934

_version_	1783497108717830144
author	Gianola, Daniel Fernando, Rohan L.
author_facet	Gianola, Daniel Fernando, Rohan L.
author_sort	Gianola, Daniel
collection	PubMed
description	A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T−variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., “short” vectors are more strongly shrunk toward the origin than in GBLUP; conversely, “long” vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ—a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
format	Online Article Text
id	pubmed-7017027
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Genetics Society of America
record_format	MEDLINE/PubMed
spelling	pubmed-70170272020-06-30 A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits Gianola, Daniel Fernando, Rohan L. Genetics Investigations A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T−variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., “short” vectors are more strongly shrunk toward the origin than in GBLUP; conversely, “long” vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ—a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits. Genetics Society of America 2020-02 2019-12-26 /pmc/articles/PMC7017027/ /pubmed/31879318 http://dx.doi.org/10.1534/genetics.119.302934 Text en Copyright © 2020 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle	Investigations Gianola, Daniel Fernando, Rohan L. A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
title	A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
title_full	A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
title_fullStr	A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
title_full_unstemmed	A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
title_short	A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits
title_sort	multiple-trait bayesian lasso for genome-enabled analysis and prediction of complex traits
topic	Investigations
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017027/ https://www.ncbi.nlm.nih.gov/pubmed/31879318 http://dx.doi.org/10.1534/genetics.119.302934
work_keys_str_mv	AT gianoladaniel amultipletraitbayesianlassoforgenomeenabledanalysisandpredictionofcomplextraits AT fernandorohanl amultipletraitbayesianlassoforgenomeenabledanalysisandpredictionofcomplextraits AT gianoladaniel multipletraitbayesianlassoforgenomeenabledanalysisandpredictionofcomplextraits AT fernandorohanl multipletraitbayesianlassoforgenomeenabledanalysisandpredictionofcomplextraits

A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits

Ejemplares similares