Cargando…
Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping
BACKGROUND: Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayes...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558724/ https://www.ncbi.nlm.nih.gov/pubmed/28810831 http://dx.doi.org/10.1186/s12864-017-4030-x |
_version_ | 1783257436067463168 |
---|---|
author | Wang, Tingting Chen, Yi-Ping Phoebe MacLeod, Iona M. Pryce, Jennie E. Goddard, Michael E. Hayes, Ben J. |
author_facet | Wang, Tingting Chen, Yi-Ping Phoebe MacLeod, Iona M. Pryce, Jennie E. Goddard, Michael E. Hayes, Ben J. |
author_sort | Wang, Tingting |
collection | PubMed |
description | BACKGROUND: Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. RESULTS: Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. CONCLUSIONS: The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations. |
format | Online Article Text |
id | pubmed-5558724 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55587242017-08-18 Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping Wang, Tingting Chen, Yi-Ping Phoebe MacLeod, Iona M. Pryce, Jennie E. Goddard, Michael E. Hayes, Ben J. BMC Genomics Research Article BACKGROUND: Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long. RESULTS: Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies. CONCLUSIONS: The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations. BioMed Central 2017-08-15 /pmc/articles/PMC5558724/ /pubmed/28810831 http://dx.doi.org/10.1186/s12864-017-4030-x Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Wang, Tingting Chen, Yi-Ping Phoebe MacLeod, Iona M. Pryce, Jennie E. Goddard, Michael E. Hayes, Ben J. Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping |
title | Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping |
title_full | Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping |
title_fullStr | Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping |
title_full_unstemmed | Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping |
title_short | Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping |
title_sort | application of a bayesian non-linear model hybrid scheme to sequence data for genomic prediction and qtl mapping |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558724/ https://www.ncbi.nlm.nih.gov/pubmed/28810831 http://dx.doi.org/10.1186/s12864-017-4030-x |
work_keys_str_mv | AT wangtingting applicationofabayesiannonlinearmodelhybridschemetosequencedataforgenomicpredictionandqtlmapping AT chenyipingphoebe applicationofabayesiannonlinearmodelhybridschemetosequencedataforgenomicpredictionandqtlmapping AT macleodionam applicationofabayesiannonlinearmodelhybridschemetosequencedataforgenomicpredictionandqtlmapping AT prycejenniee applicationofabayesiannonlinearmodelhybridschemetosequencedataforgenomicpredictionandqtlmapping AT goddardmichaele applicationofabayesiannonlinearmodelhybridschemetosequencedataforgenomicpredictionandqtlmapping AT hayesbenj applicationofabayesiannonlinearmodelhybridschemetosequencedataforgenomicpredictionandqtlmapping |