Cargando…
Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants
Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the p...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6554347/ https://www.ncbi.nlm.nih.gov/pubmed/31214246 http://dx.doi.org/10.3389/fgene.2019.00510 |
_version_ | 1783424955317223424 |
---|---|
author | Butty, Adrien M. Sargolzaei, Mehdi Miglior, Filippo Stothard, Paul Schenkel, Flavio S. Gredler-Grandl, Birgit Baes, Christine F. |
author_facet | Butty, Adrien M. Sargolzaei, Mehdi Miglior, Filippo Stothard, Paul Schenkel, Flavio S. Gredler-Grandl, Birgit Baes, Christine F. |
author_sort | Butty, Adrien M. |
collection | PubMed |
description | Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended. |
format | Online Article Text |
id | pubmed-6554347 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-65543472019-06-18 Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants Butty, Adrien M. Sargolzaei, Mehdi Miglior, Filippo Stothard, Paul Schenkel, Flavio S. Gredler-Grandl, Birgit Baes, Christine F. Front Genet Genetics Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended. Frontiers Media S.A. 2019-05-31 /pmc/articles/PMC6554347/ /pubmed/31214246 http://dx.doi.org/10.3389/fgene.2019.00510 Text en Copyright © 2019 Butty, Sargolzaei, Miglior, Stothard, Schenkel, Gredler-Grandl and Baes. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Butty, Adrien M. Sargolzaei, Mehdi Miglior, Filippo Stothard, Paul Schenkel, Flavio S. Gredler-Grandl, Birgit Baes, Christine F. Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants |
title | Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants |
title_full | Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants |
title_fullStr | Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants |
title_full_unstemmed | Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants |
title_short | Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants |
title_sort | optimizing selection of the reference population for genotype imputation from array to sequence variants |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6554347/ https://www.ncbi.nlm.nih.gov/pubmed/31214246 http://dx.doi.org/10.3389/fgene.2019.00510 |
work_keys_str_mv | AT buttyadrienm optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants AT sargolzaeimehdi optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants AT migliorfilippo optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants AT stothardpaul optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants AT schenkelflavios optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants AT gredlergrandlbirgit optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants AT baeschristinef optimizingselectionofthereferencepopulationforgenotypeimputationfromarraytosequencevariants |