Cargando…

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL

BACKGROUND: Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here...

Descripción completa

Detalles Bibliográficos
Autores principales: Meuwissen, Theo, van den Berg, Irene, Goddard, Mike
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7908738/
https://www.ncbi.nlm.nih.gov/pubmed/33637049
http://dx.doi.org/10.1186/s12711-021-00607-4
_version_ 1783655781335302144
author Meuwissen, Theo
van den Berg, Irene
Goddard, Mike
author_facet Meuwissen, Theo
van den Berg, Irene
Goddard, Mike
author_sort Meuwissen, Theo
collection PubMed
description BACKGROUND: Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. METHODS: The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. RESULTS: The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. CONCLUSIONS: Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.
format Online
Article
Text
id pubmed-7908738
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79087382021-02-26 On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL Meuwissen, Theo van den Berg, Irene Goddard, Mike Genet Sel Evol Research Article BACKGROUND: Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision. METHODS: The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis–Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits. RESULTS: The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits. CONCLUSIONS: Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL. BioMed Central 2021-02-26 /pmc/articles/PMC7908738/ /pubmed/33637049 http://dx.doi.org/10.1186/s12711-021-00607-4 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Meuwissen, Theo
van den Berg, Irene
Goddard, Mike
On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL
title On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL
title_full On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL
title_fullStr On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL
title_full_unstemmed On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL
title_short On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL
title_sort on the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of qtl
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7908738/
https://www.ncbi.nlm.nih.gov/pubmed/33637049
http://dx.doi.org/10.1186/s12711-021-00607-4
work_keys_str_mv AT meuwissentheo ontheuseofwholegenomesequencedataforacrossbreedgenomicpredictionandfinescalemappingofqtl
AT vandenbergirene ontheuseofwholegenomesequencedataforacrossbreedgenomicpredictionandfinescalemappingofqtl
AT goddardmike ontheuseofwholegenomesequencedataforacrossbreedgenomicpredictionandfinescalemappingofqtl