Cargando…

Finding the Optimal Imputation Strategy for Small Cattle Populations

The imputation from lower density SNP chip genotypes to whole-genome sequence level is an established approach to generate high density genotypes for many individuals. Imputation accuracy is dependent on many factors and for small cattle populations such as the endangered German Black Pied cattle (D...

Descripción completa

Detalles Bibliográficos
Autores principales: Korkuć, Paula, Arends, Danny, Brockmann, Gudrun A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6387911/
https://www.ncbi.nlm.nih.gov/pubmed/30833959
http://dx.doi.org/10.3389/fgene.2019.00052
_version_ 1783397659968536576
author Korkuć, Paula
Arends, Danny
Brockmann, Gudrun A.
author_facet Korkuć, Paula
Arends, Danny
Brockmann, Gudrun A.
author_sort Korkuć, Paula
collection PubMed
description The imputation from lower density SNP chip genotypes to whole-genome sequence level is an established approach to generate high density genotypes for many individuals. Imputation accuracy is dependent on many factors and for small cattle populations such as the endangered German Black Pied cattle (DSN), determining the optimal imputation strategy is especially challenging since only a low number of high density genotypes is available. In this paper, the accuracy of imputation was explored with regard to (1) phasing of the target population and the reference panel for imputation, (2) comparison of a 1-step imputation approach, where 50 k genotypes are directly imputed to sequence level, to a 2-step imputation approach that used an intermediate step imputing first to 700 k and subsequently to sequence level, (3) the software tools Beagle and Minimac, and (4) the size and composition of the reference panel for imputation. Analyses were performed for 30 DSN and 30 Holstein Frisian cattle available from the 1000 Bull Genomes Project. Imputation accuracy was assessed using a leave-one-out cross validation procedure. We observed that phasing of the target populations and the reference panels affects the imputation accuracy significantly. Minimac reached higher accuracy when imputing using small reference panels, while Beagle performed better with larger reference panels. In contrast to previous research, we found that when a low number of animals is available at the intermediate imputation step, the 1-step imputation approach yielded higher imputation accuracy compared to a 2-step imputation. Overall, the size of the reference panel for imputation is the most important factor leading to higher imputation accuracy, although using a larger reference panel consisting of a related but different breed (Holstein Frisian) significantly reduced imputation accuracy. Our findings provide specific recommendations for populations with a limited number of high density genotyped or sequenced animals available such as DSN. The overall recommendation when imputing a small population are to (1) use a large reference panel of the same breed, (2) use a large reference panel consisting of diverse breeds, or (3) when a large reference panel is not available, we recommend using a smaller same breed reference panel without including a different related breed.
format Online
Article
Text
id pubmed-6387911
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-63879112019-03-04 Finding the Optimal Imputation Strategy for Small Cattle Populations Korkuć, Paula Arends, Danny Brockmann, Gudrun A. Front Genet Genetics The imputation from lower density SNP chip genotypes to whole-genome sequence level is an established approach to generate high density genotypes for many individuals. Imputation accuracy is dependent on many factors and for small cattle populations such as the endangered German Black Pied cattle (DSN), determining the optimal imputation strategy is especially challenging since only a low number of high density genotypes is available. In this paper, the accuracy of imputation was explored with regard to (1) phasing of the target population and the reference panel for imputation, (2) comparison of a 1-step imputation approach, where 50 k genotypes are directly imputed to sequence level, to a 2-step imputation approach that used an intermediate step imputing first to 700 k and subsequently to sequence level, (3) the software tools Beagle and Minimac, and (4) the size and composition of the reference panel for imputation. Analyses were performed for 30 DSN and 30 Holstein Frisian cattle available from the 1000 Bull Genomes Project. Imputation accuracy was assessed using a leave-one-out cross validation procedure. We observed that phasing of the target populations and the reference panels affects the imputation accuracy significantly. Minimac reached higher accuracy when imputing using small reference panels, while Beagle performed better with larger reference panels. In contrast to previous research, we found that when a low number of animals is available at the intermediate imputation step, the 1-step imputation approach yielded higher imputation accuracy compared to a 2-step imputation. Overall, the size of the reference panel for imputation is the most important factor leading to higher imputation accuracy, although using a larger reference panel consisting of a related but different breed (Holstein Frisian) significantly reduced imputation accuracy. Our findings provide specific recommendations for populations with a limited number of high density genotyped or sequenced animals available such as DSN. The overall recommendation when imputing a small population are to (1) use a large reference panel of the same breed, (2) use a large reference panel consisting of diverse breeds, or (3) when a large reference panel is not available, we recommend using a smaller same breed reference panel without including a different related breed. Frontiers Media S.A. 2019-02-18 /pmc/articles/PMC6387911/ /pubmed/30833959 http://dx.doi.org/10.3389/fgene.2019.00052 Text en Copyright © 2019 Korkuć, Arends and Brockmann. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Korkuć, Paula
Arends, Danny
Brockmann, Gudrun A.
Finding the Optimal Imputation Strategy for Small Cattle Populations
title Finding the Optimal Imputation Strategy for Small Cattle Populations
title_full Finding the Optimal Imputation Strategy for Small Cattle Populations
title_fullStr Finding the Optimal Imputation Strategy for Small Cattle Populations
title_full_unstemmed Finding the Optimal Imputation Strategy for Small Cattle Populations
title_short Finding the Optimal Imputation Strategy for Small Cattle Populations
title_sort finding the optimal imputation strategy for small cattle populations
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6387911/
https://www.ncbi.nlm.nih.gov/pubmed/30833959
http://dx.doi.org/10.3389/fgene.2019.00052
work_keys_str_mv AT korkucpaula findingtheoptimalimputationstrategyforsmallcattlepopulations
AT arendsdanny findingtheoptimalimputationstrategyforsmallcattlepopulations
AT brockmanngudruna findingtheoptimalimputationstrategyforsmallcattlepopulations