Cargando…

Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle

BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a hi...

Descripción completa

Detalles Bibliográficos
Autores principales: Brøndum, Rasmus Froberg, Guldbrandtsen, Bernt, Sahana, Goutam, Lund, Mogens Sandø, Su, Guosheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152568/
https://www.ncbi.nlm.nih.gov/pubmed/25164068
http://dx.doi.org/10.1186/1471-2164-15-728
_version_ 1782333143313809408
author Brøndum, Rasmus Froberg
Guldbrandtsen, Bernt
Sahana, Goutam
Lund, Mogens Sandø
Su, Guosheng
author_facet Brøndum, Rasmus Froberg
Guldbrandtsen, Bernt
Sahana, Goutam
Lund, Mogens Sandø
Su, Guosheng
author_sort Brøndum, Rasmus Froberg
collection PubMed
description BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. RESULTS: A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual). CONCLUSION: Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.
format Online
Article
Text
id pubmed-4152568
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41525682014-09-09 Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle Brøndum, Rasmus Froberg Guldbrandtsen, Bernt Sahana, Goutam Lund, Mogens Sandø Su, Guosheng BMC Genomics Research Article BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. RESULTS: A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual). CONCLUSION: Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy. BioMed Central 2014-08-27 /pmc/articles/PMC4152568/ /pubmed/25164068 http://dx.doi.org/10.1186/1471-2164-15-728 Text en © Brøndum et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Brøndum, Rasmus Froberg
Guldbrandtsen, Bernt
Sahana, Goutam
Lund, Mogens Sandø
Su, Guosheng
Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
title Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
title_full Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
title_fullStr Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
title_full_unstemmed Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
title_short Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
title_sort strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152568/
https://www.ncbi.nlm.nih.gov/pubmed/25164068
http://dx.doi.org/10.1186/1471-2164-15-728
work_keys_str_mv AT brøndumrasmusfroberg strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle
AT guldbrandtsenbernt strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle
AT sahanagoutam strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle
AT lundmogenssandø strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle
AT suguosheng strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle