Cargando…
Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle
BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a hi...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152568/ https://www.ncbi.nlm.nih.gov/pubmed/25164068 http://dx.doi.org/10.1186/1471-2164-15-728 |
_version_ | 1782333143313809408 |
---|---|
author | Brøndum, Rasmus Froberg Guldbrandtsen, Bernt Sahana, Goutam Lund, Mogens Sandø Su, Guosheng |
author_facet | Brøndum, Rasmus Froberg Guldbrandtsen, Bernt Sahana, Goutam Lund, Mogens Sandø Su, Guosheng |
author_sort | Brøndum, Rasmus Froberg |
collection | PubMed |
description | BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. RESULTS: A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual). CONCLUSION: Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy. |
format | Online Article Text |
id | pubmed-4152568 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41525682014-09-09 Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle Brøndum, Rasmus Froberg Guldbrandtsen, Bernt Sahana, Goutam Lund, Mogens Sandø Su, Guosheng BMC Genomics Research Article BACKGROUND: The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. RESULTS: A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual). CONCLUSION: Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy. BioMed Central 2014-08-27 /pmc/articles/PMC4152568/ /pubmed/25164068 http://dx.doi.org/10.1186/1471-2164-15-728 Text en © Brøndum et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Brøndum, Rasmus Froberg Guldbrandtsen, Bernt Sahana, Goutam Lund, Mogens Sandø Su, Guosheng Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
title | Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
title_full | Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
title_fullStr | Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
title_full_unstemmed | Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
title_short | Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
title_sort | strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152568/ https://www.ncbi.nlm.nih.gov/pubmed/25164068 http://dx.doi.org/10.1186/1471-2164-15-728 |
work_keys_str_mv | AT brøndumrasmusfroberg strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle AT guldbrandtsenbernt strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle AT sahanagoutam strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle AT lundmogenssandø strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle AT suguosheng strategiesforimputationtowholegenomesequenceusingasingleormultibreedreferencepopulationincattle |