Cargando…

Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect

BACKGROUND: The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucl...

Descripción completa

Detalles Bibliográficos
Autores principales:	van den Berg, Irene, Bowman, Phil J., MacLeod, Iona M., Hayes, Ben J., Wang, Tingting, Bolormaa, Sunduimijid, Goddard, Mike E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5609075/ https://www.ncbi.nlm.nih.gov/pubmed/28934948 http://dx.doi.org/10.1186/s12711-017-0347-9

_version_	1783265547148853248
author	van den Berg, Irene Bowman, Phil J. MacLeod, Iona M. Hayes, Ben J. Wang, Tingting Bolormaa, Sunduimijid Goddard, Mike E.
author_facet	van den Berg, Irene Bowman, Phil J. MacLeod, Iona M. Hayes, Ben J. Wang, Tingting Bolormaa, Sunduimijid Goddard, Mike E.
author_sort	van den Berg, Irene
collection	PubMed
description	BACKGROUND: The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. RESULTS: With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. CONCLUSIONS: We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5609075
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-56090752017-09-25 Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect van den Berg, Irene Bowman, Phil J. MacLeod, Iona M. Hayes, Ben J. Wang, Tingting Bolormaa, Sunduimijid Goddard, Mike E. Genet Sel Evol Research Article BACKGROUND: The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. RESULTS: With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. CONCLUSIONS: We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-21 /pmc/articles/PMC5609075/ /pubmed/28934948 http://dx.doi.org/10.1186/s12711-017-0347-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article van den Berg, Irene Bowman, Phil J. MacLeod, Iona M. Hayes, Ben J. Wang, Tingting Bolormaa, Sunduimijid Goddard, Mike E. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title	Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_full	Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_fullStr	Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_full_unstemmed	Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_short	Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_sort	multi-breed genomic prediction using bayes r with sequence data and dropping variants with a small effect
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5609075/ https://www.ncbi.nlm.nih.gov/pubmed/28934948 http://dx.doi.org/10.1186/s12711-017-0347-9
work_keys_str_mv	AT vandenbergirene multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect AT bowmanphilj multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect AT macleodionam multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect AT hayesbenj multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect AT wangtingting multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect AT bolormaasunduimijid multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect AT goddardmikee multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect

Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect

Ejemplares similares