Cargando…

Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

BACKGROUND: The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chi...

Descripción completa

Detalles Bibliográficos
Autores principales: Bouwman, Aniek C, Veerkamp, Roel F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4189672/
https://www.ncbi.nlm.nih.gov/pubmed/25277486
http://dx.doi.org/10.1186/s12863-014-0105-8
_version_ 1782338398646697984
author Bouwman, Aniek C
Veerkamp, Roel F
author_facet Bouwman, Aniek C
Veerkamp, Roel F
author_sort Bouwman, Aniek C
collection PubMed
description BACKGROUND: The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. RESULTS: Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. CONCLUSIONS: This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference populations are small and sequencing effort is limiting. When sequencing effort is limiting and interest lays in multiple breeds or lines this provides imputation of each breed.
format Online
Article
Text
id pubmed-4189672
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41896722014-10-23 Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy Bouwman, Aniek C Veerkamp, Roel F BMC Genet Research Article BACKGROUND: The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. RESULTS: Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. CONCLUSIONS: This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference populations are small and sequencing effort is limiting. When sequencing effort is limiting and interest lays in multiple breeds or lines this provides imputation of each breed. BioMed Central 2014-10-03 /pmc/articles/PMC4189672/ /pubmed/25277486 http://dx.doi.org/10.1186/s12863-014-0105-8 Text en © Bouwman and Veerkamp; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bouwman, Aniek C
Veerkamp, Roel F
Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
title Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
title_full Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
title_fullStr Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
title_full_unstemmed Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
title_short Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
title_sort consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4189672/
https://www.ncbi.nlm.nih.gov/pubmed/25277486
http://dx.doi.org/10.1186/s12863-014-0105-8
work_keys_str_mv AT bouwmananiekc consequencesofsplittingwholegenomesequencingeffortovermultiplebreedsonimputationaccuracy
AT veerkamproelf consequencesofsplittingwholegenomesequencingeffortovermultiplebreedsonimputationaccuracy