Cargando…

A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes

Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Morgante, Fabio, Carbonetto, Peter, Wang, Gao, Zou, Yuxin, Sarkar, Abhishek, Stephens, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355440/
https://www.ncbi.nlm.nih.gov/pubmed/37418505
http://dx.doi.org/10.1371/journal.pgen.1010539
_version_ 1785075142574997504
author Morgante, Fabio
Carbonetto, Peter
Wang, Gao
Zou, Yuxin
Sarkar, Abhishek
Stephens, Matthew
author_facet Morgante, Fabio
Carbonetto, Peter
Wang, Gao
Zou, Yuxin
Sarkar, Abhishek
Stephens, Matthew
author_sort Morgante, Fabio
collection PubMed
description Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms.
format Online
Article
Text
id pubmed-10355440
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-103554402023-07-20 A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes Morgante, Fabio Carbonetto, Peter Wang, Gao Zou, Yuxin Sarkar, Abhishek Stephens, Matthew PLoS Genet Methods Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms. Public Library of Science 2023-07-07 /pmc/articles/PMC10355440/ /pubmed/37418505 http://dx.doi.org/10.1371/journal.pgen.1010539 Text en © 2023 Morgante et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Methods
Morgante, Fabio
Carbonetto, Peter
Wang, Gao
Zou, Yuxin
Sarkar, Abhishek
Stephens, Matthew
A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
title A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
title_full A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
title_fullStr A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
title_full_unstemmed A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
title_short A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
title_sort flexible empirical bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10355440/
https://www.ncbi.nlm.nih.gov/pubmed/37418505
http://dx.doi.org/10.1371/journal.pgen.1010539
work_keys_str_mv AT morgantefabio aflexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT carbonettopeter aflexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT wanggao aflexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT zouyuxin aflexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT sarkarabhishek aflexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT stephensmatthew aflexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT morgantefabio flexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT carbonettopeter flexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT wanggao flexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT zouyuxin flexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT sarkarabhishek flexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes
AT stephensmatthew flexibleempiricalbayesapproachtomultivariatemultipleregressionanditsimprovedaccuracyinpredictingmultitissuegeneexpressionfromgenotypes