Cargando…
Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian mult...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9991502/ https://www.ncbi.nlm.nih.gov/pubmed/36529897 http://dx.doi.org/10.1093/genetics/iyac183 |
_version_ | 1784902168664342528 |
---|---|
author | Qu, Jiayi Runcie, Daniel Cheng, Hao |
author_facet | Qu, Jiayi Runcie, Daniel Cheng, Hao |
author_sort | Qu, Jiayi |
collection | PubMed |
description | Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses. |
format | Online Article Text |
id | pubmed-9991502 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-99915022023-03-08 Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits Qu, Jiayi Runcie, Daniel Cheng, Hao Genetics Investigation Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses. Oxford University Press 2022-12-19 /pmc/articles/PMC9991502/ /pubmed/36529897 http://dx.doi.org/10.1093/genetics/iyac183 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigation Qu, Jiayi Runcie, Daniel Cheng, Hao Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
title | Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
title_full | Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
title_fullStr | Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
title_full_unstemmed | Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
title_short | Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
title_sort | mega-scale bayesian regression methods for genome-wide prediction and association studies with thousands of traits |
topic | Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9991502/ https://www.ncbi.nlm.nih.gov/pubmed/36529897 http://dx.doi.org/10.1093/genetics/iyac183 |
work_keys_str_mv | AT qujiayi megascalebayesianregressionmethodsforgenomewidepredictionandassociationstudieswiththousandsoftraits AT runciedaniel megascalebayesianregressionmethodsforgenomewidepredictionandassociationstudieswiththousandsoftraits AT chenghao megascalebayesianregressionmethodsforgenomewidepredictionandassociationstudieswiththousandsoftraits |