Cargando…

Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits

Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian mult...

Descripción completa

Detalles Bibliográficos
Autores principales: Qu, Jiayi, Runcie, Daniel, Cheng, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9991502/
https://www.ncbi.nlm.nih.gov/pubmed/36529897
http://dx.doi.org/10.1093/genetics/iyac183
_version_ 1784902168664342528
author Qu, Jiayi
Runcie, Daniel
Cheng, Hao
author_facet Qu, Jiayi
Runcie, Daniel
Cheng, Hao
author_sort Qu, Jiayi
collection PubMed
description Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
format Online
Article
Text
id pubmed-9991502
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99915022023-03-08 Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits Qu, Jiayi Runcie, Daniel Cheng, Hao Genetics Investigation Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses. Oxford University Press 2022-12-19 /pmc/articles/PMC9991502/ /pubmed/36529897 http://dx.doi.org/10.1093/genetics/iyac183 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Qu, Jiayi
Runcie, Daniel
Cheng, Hao
Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
title Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
title_full Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
title_fullStr Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
title_full_unstemmed Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
title_short Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
title_sort mega-scale bayesian regression methods for genome-wide prediction and association studies with thousands of traits
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9991502/
https://www.ncbi.nlm.nih.gov/pubmed/36529897
http://dx.doi.org/10.1093/genetics/iyac183
work_keys_str_mv AT qujiayi megascalebayesianregressionmethodsforgenomewidepredictionandassociationstudieswiththousandsoftraits
AT runciedaniel megascalebayesianregressionmethodsforgenomewidepredictionandassociationstudieswiththousandsoftraits
AT chenghao megascalebayesianregressionmethodsforgenomewidepredictionandassociationstudieswiththousandsoftraits