Cargando…

Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants

Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the p...

Descripción completa

Detalles Bibliográficos
Autores principales: Butty, Adrien M., Sargolzaei, Mehdi, Miglior, Filippo, Stothard, Paul, Schenkel, Flavio S., Gredler-Grandl, Birgit, Baes, Christine F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6554347/
https://www.ncbi.nlm.nih.gov/pubmed/31214246
http://dx.doi.org/10.3389/fgene.2019.00510
_version_ 1783424955317223424
author Butty, Adrien M.
Sargolzaei, Mehdi
Miglior, Filippo
Stothard, Paul
Schenkel, Flavio S.
Gredler-Grandl, Birgit
Baes, Christine F.
author_facet Butty, Adrien M.
Sargolzaei, Mehdi
Miglior, Filippo
Stothard, Paul
Schenkel, Flavio S.
Gredler-Grandl, Birgit
Baes, Christine F.
author_sort Butty, Adrien M.
collection PubMed
description Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended.
format Online
Article
Text
id pubmed-6554347
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-65543472019-06-18 Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants Butty, Adrien M. Sargolzaei, Mehdi Miglior, Filippo Stothard, Paul Schenkel, Flavio S. Gredler-Grandl, Birgit Baes, Christine F. Front Genet Genetics Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended. Frontiers Media S.A. 2019-05-31 /pmc/articles/PMC6554347/ /pubmed/31214246 http://dx.doi.org/10.3389/fgene.2019.00510 Text en Copyright © 2019 Butty, Sargolzaei, Miglior, Stothard, Schenkel, Gredler-Grandl and Baes. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Butty, Adrien M.
Sargolzaei, Mehdi
Miglior, Filippo
Stothard, Paul
Schenkel, Flavio S.
Gredler-Grandl, Birgit
Baes, Christine F.
Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
title Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
title_full Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
title_fullStr Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
title_full_unstemmed Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
title_short Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
title_sort optimizing selection of the reference population for genotype imputation from array to sequence variants
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6554347/
https://www.ncbi.nlm.nih.gov/pubmed/31214246
http://dx.doi.org/10.3389/fgene.2019.00510
work_keys_str_mv AT buttyadrienm optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants
AT sargolzaeimehdi optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants
AT migliorfilippo optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants
AT stothardpaul optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants
AT schenkelflavios optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants
AT gredlergrandlbirgit optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants
AT baeschristinef optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants