Cargando…

Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations

BACKGROUND: The coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper, we describe and validate the coupling of a sequencing strategy with the imputation...

Descripción completa

Detalles Bibliográficos
Autores principales: Ros-Freixedes, Roger, Whalen, Andrew, Chen, Ching-Yi, Gorjanc, Gregor, Herring, William O., Mileham, Alan J., Hickey, John M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132992/
https://www.ncbi.nlm.nih.gov/pubmed/32248811
http://dx.doi.org/10.1186/s12711-020-00536-8
_version_ 1783517542794395648
author Ros-Freixedes, Roger
Whalen, Andrew
Chen, Ching-Yi
Gorjanc, Gregor
Herring, William O.
Mileham, Alan J.
Hickey, John M.
author_facet Ros-Freixedes, Roger
Whalen, Andrew
Chen, Ching-Yi
Gorjanc, Gregor
Herring, William O.
Mileham, Alan J.
Hickey, John M.
author_sort Ros-Freixedes, Roger
collection PubMed
description BACKGROUND: The coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper, we describe and validate the coupling of a sequencing strategy with the imputation method hybrid peeling in real animal breeding settings. METHODS: We used data from four pig populations of different size (18,349 to 107,815 individuals) that were widely genotyped at densities between 15,000 and 75,000 markers genome-wide. Around 2% of the individuals in each population were sequenced (most of them at 1× or 2× and 37–92 individuals per population, totalling 284, at 15–30×). We imputed whole-genome sequence data with hybrid peeling. We evaluated the imputation accuracy by removing the sequence data of the 284 individuals with high coverage, using a leave-one-out design. We simulated data that mimicked the sequencing strategy used in the real populations to quantify the factors that affected the individual-wise and variant-wise imputation accuracies using regression trees. RESULTS: Imputation accuracy was high for the majority of individuals in all four populations (median individual-wise dosage correlation: 0.97). Imputation accuracy was lower for individuals in the earliest generations of each population than for the rest, due to the lack of marker array data for themselves and their ancestors. The main factors that determined the individual-wise imputation accuracy were the genotyping status, the availability of marker array data for immediate ancestors, and the degree of connectedness to the rest of the population, but sequencing coverage of the relatives had no effect. The main factors that determined variant-wise imputation accuracy were the minor allele frequency and the number of individuals with sequencing coverage at each variant site. Results were validated with the empirical observations. CONCLUSIONS: We demonstrate that the coupling of an appropriate sequencing strategy and hybrid peeling is a powerful strategy for generating whole-genome sequence data with high accuracy in large pedigreed populations where only a small fraction of individuals (2%) had been sequenced, mostly at low coverage. This is a critical step for the successful implementation of whole-genome sequence data for genomic prediction and fine-mapping of causal variants.
format Online
Article
Text
id pubmed-7132992
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71329922020-04-11 Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations Ros-Freixedes, Roger Whalen, Andrew Chen, Ching-Yi Gorjanc, Gregor Herring, William O. Mileham, Alan J. Hickey, John M. Genet Sel Evol Research Article BACKGROUND: The coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper, we describe and validate the coupling of a sequencing strategy with the imputation method hybrid peeling in real animal breeding settings. METHODS: We used data from four pig populations of different size (18,349 to 107,815 individuals) that were widely genotyped at densities between 15,000 and 75,000 markers genome-wide. Around 2% of the individuals in each population were sequenced (most of them at 1× or 2× and 37–92 individuals per population, totalling 284, at 15–30×). We imputed whole-genome sequence data with hybrid peeling. We evaluated the imputation accuracy by removing the sequence data of the 284 individuals with high coverage, using a leave-one-out design. We simulated data that mimicked the sequencing strategy used in the real populations to quantify the factors that affected the individual-wise and variant-wise imputation accuracies using regression trees. RESULTS: Imputation accuracy was high for the majority of individuals in all four populations (median individual-wise dosage correlation: 0.97). Imputation accuracy was lower for individuals in the earliest generations of each population than for the rest, due to the lack of marker array data for themselves and their ancestors. The main factors that determined the individual-wise imputation accuracy were the genotyping status, the availability of marker array data for immediate ancestors, and the degree of connectedness to the rest of the population, but sequencing coverage of the relatives had no effect. The main factors that determined variant-wise imputation accuracy were the minor allele frequency and the number of individuals with sequencing coverage at each variant site. Results were validated with the empirical observations. CONCLUSIONS: We demonstrate that the coupling of an appropriate sequencing strategy and hybrid peeling is a powerful strategy for generating whole-genome sequence data with high accuracy in large pedigreed populations where only a small fraction of individuals (2%) had been sequenced, mostly at low coverage. This is a critical step for the successful implementation of whole-genome sequence data for genomic prediction and fine-mapping of causal variants. BioMed Central 2020-04-06 /pmc/articles/PMC7132992/ /pubmed/32248811 http://dx.doi.org/10.1186/s12711-020-00536-8 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Ros-Freixedes, Roger
Whalen, Andrew
Chen, Ching-Yi
Gorjanc, Gregor
Herring, William O.
Mileham, Alan J.
Hickey, John M.
Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
title Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
title_full Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
title_fullStr Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
title_full_unstemmed Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
title_short Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
title_sort accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132992/
https://www.ncbi.nlm.nih.gov/pubmed/32248811
http://dx.doi.org/10.1186/s12711-020-00536-8
work_keys_str_mv AT rosfreixedesroger accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations
AT whalenandrew accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations
AT chenchingyi accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations
AT gorjancgregor accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations
AT herringwilliamo accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations
AT milehamalanj accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations
AT hickeyjohnm accuracyofwholegenomesequenceimputationusinghybridpeelinginlargepedigreedlivestockpopulations