Cargando…

Genotype Calling from Population-Genomic Sequencing Data

Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As...

Descripción completa

Detalles Bibliográficos
Autores principales: Maruki, Takahiro, Lynch, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5427492/
https://www.ncbi.nlm.nih.gov/pubmed/28108551
http://dx.doi.org/10.1534/g3.117.039008
_version_ 1783235636463927296
author Maruki, Takahiro
Lynch, Michael
author_facet Maruki, Takahiro
Lynch, Michael
author_sort Maruki, Takahiro
collection PubMed
description Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes.
format Online
Article
Text
id pubmed-5427492
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-54274922017-05-12 Genotype Calling from Population-Genomic Sequencing Data Maruki, Takahiro Lynch, Michael G3 (Bethesda) Investigations Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes. Genetics Society of America 2017-01-19 /pmc/articles/PMC5427492/ /pubmed/28108551 http://dx.doi.org/10.1534/g3.117.039008 Text en Copyright © 2017 Maruki and Lynch http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Maruki, Takahiro
Lynch, Michael
Genotype Calling from Population-Genomic Sequencing Data
title Genotype Calling from Population-Genomic Sequencing Data
title_full Genotype Calling from Population-Genomic Sequencing Data
title_fullStr Genotype Calling from Population-Genomic Sequencing Data
title_full_unstemmed Genotype Calling from Population-Genomic Sequencing Data
title_short Genotype Calling from Population-Genomic Sequencing Data
title_sort genotype calling from population-genomic sequencing data
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5427492/
https://www.ncbi.nlm.nih.gov/pubmed/28108551
http://dx.doi.org/10.1534/g3.117.039008
work_keys_str_mv AT marukitakahiro genotypecallingfrompopulationgenomicsequencingdata
AT lynchmichael genotypecallingfrompopulationgenomicsequencingdata