Cargando…

A computationally efficient algorithm for genomic prediction using a Bayesian model

BACKGROUND: Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Tingting, Chen, Yi-Ping Phoebe, Goddard, Michael E, Meuwissen, Theo HE, Kemper, Kathryn E, Hayes, Ben J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415253/
https://www.ncbi.nlm.nih.gov/pubmed/25926276
http://dx.doi.org/10.1186/s12711-014-0082-4
_version_ 1782369043021299712
author Wang, Tingting
Chen, Yi-Ping Phoebe
Goddard, Michael E
Meuwissen, Theo HE
Kemper, Kathryn E
Hayes, Ben J
author_facet Wang, Tingting
Chen, Yi-Ping Phoebe
Goddard, Michael E
Meuwissen, Theo HE
Kemper, Kathryn E
Hayes, Ben J
author_sort Wang, Tingting
collection PubMed
description BACKGROUND: Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation–maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time. METHODS: emBayesR is an approximate EM algorithm that retains the BayesR model assumption with SNP effects sampled from a mixture of normal distributions with increasing variance. emBayesR differs from other proposed non-MCMC implementations of Bayesian methods for genomic prediction in that it estimates the effect of each SNP while allowing for the error associated with estimation of all other SNP effects. emBayesR was compared to BayesR using simulated data, and real dairy cattle data with 632 003 SNPs genotyped, to determine if the MCMC and the expectation-maximisation approaches give similar accuracies of genomic prediction. RESULTS: We were able to demonstrate that allowing for the error associated with estimation of other SNP effects when estimating the effect of each SNP in emBayesR improved the accuracy of genomic prediction over emBayesR without including this error correction, with both simulated and real data. When averaged over nine dairy traits, the accuracy of genomic prediction with emBayesR was only 0.5% lower than that from BayesR. However, emBayesR reduced computing time up to 8-fold compared to BayesR. CONCLUSIONS: The emBayesR algorithm described here achieved similar accuracies of genomic prediction to BayesR for a range of simulated and real 630 K dairy SNP data. emBayesR needs less computing time than BayesR, which will allow it to be applied to larger datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-014-0082-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4415253
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44152532015-05-01 A computationally efficient algorithm for genomic prediction using a Bayesian model Wang, Tingting Chen, Yi-Ping Phoebe Goddard, Michael E Meuwissen, Theo HE Kemper, Kathryn E Hayes, Ben J Genet Sel Evol Research BACKGROUND: Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation–maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time. METHODS: emBayesR is an approximate EM algorithm that retains the BayesR model assumption with SNP effects sampled from a mixture of normal distributions with increasing variance. emBayesR differs from other proposed non-MCMC implementations of Bayesian methods for genomic prediction in that it estimates the effect of each SNP while allowing for the error associated with estimation of all other SNP effects. emBayesR was compared to BayesR using simulated data, and real dairy cattle data with 632 003 SNPs genotyped, to determine if the MCMC and the expectation-maximisation approaches give similar accuracies of genomic prediction. RESULTS: We were able to demonstrate that allowing for the error associated with estimation of other SNP effects when estimating the effect of each SNP in emBayesR improved the accuracy of genomic prediction over emBayesR without including this error correction, with both simulated and real data. When averaged over nine dairy traits, the accuracy of genomic prediction with emBayesR was only 0.5% lower than that from BayesR. However, emBayesR reduced computing time up to 8-fold compared to BayesR. CONCLUSIONS: The emBayesR algorithm described here achieved similar accuracies of genomic prediction to BayesR for a range of simulated and real 630 K dairy SNP data. emBayesR needs less computing time than BayesR, which will allow it to be applied to larger datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-014-0082-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-30 /pmc/articles/PMC4415253/ /pubmed/25926276 http://dx.doi.org/10.1186/s12711-014-0082-4 Text en © Wang et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Tingting
Chen, Yi-Ping Phoebe
Goddard, Michael E
Meuwissen, Theo HE
Kemper, Kathryn E
Hayes, Ben J
A computationally efficient algorithm for genomic prediction using a Bayesian model
title A computationally efficient algorithm for genomic prediction using a Bayesian model
title_full A computationally efficient algorithm for genomic prediction using a Bayesian model
title_fullStr A computationally efficient algorithm for genomic prediction using a Bayesian model
title_full_unstemmed A computationally efficient algorithm for genomic prediction using a Bayesian model
title_short A computationally efficient algorithm for genomic prediction using a Bayesian model
title_sort computationally efficient algorithm for genomic prediction using a bayesian model
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415253/
https://www.ncbi.nlm.nih.gov/pubmed/25926276
http://dx.doi.org/10.1186/s12711-014-0082-4
work_keys_str_mv AT wangtingting acomputationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT chenyipingphoebe acomputationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT goddardmichaele acomputationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT meuwissentheohe acomputationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT kemperkathryne acomputationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT hayesbenj acomputationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT wangtingting computationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT chenyipingphoebe computationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT goddardmichaele computationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT meuwissentheohe computationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT kemperkathryne computationallyefficientalgorithmforgenomicpredictionusingabayesianmodel
AT hayesbenj computationallyefficientalgorithmforgenomicpredictionusingabayesianmodel