Cargando…
Genotype Calling from Population-Genomic Sequencing Data
Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5427492/ https://www.ncbi.nlm.nih.gov/pubmed/28108551 http://dx.doi.org/10.1534/g3.117.039008 |
_version_ | 1783235636463927296 |
---|---|
author | Maruki, Takahiro Lynch, Michael |
author_facet | Maruki, Takahiro Lynch, Michael |
author_sort | Maruki, Takahiro |
collection | PubMed |
description | Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes. |
format | Online Article Text |
id | pubmed-5427492 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-54274922017-05-12 Genotype Calling from Population-Genomic Sequencing Data Maruki, Takahiro Lynch, Michael G3 (Bethesda) Investigations Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes. Genetics Society of America 2017-01-19 /pmc/articles/PMC5427492/ /pubmed/28108551 http://dx.doi.org/10.1534/g3.117.039008 Text en Copyright © 2017 Maruki and Lynch http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Maruki, Takahiro Lynch, Michael Genotype Calling from Population-Genomic Sequencing Data |
title | Genotype Calling from Population-Genomic Sequencing Data |
title_full | Genotype Calling from Population-Genomic Sequencing Data |
title_fullStr | Genotype Calling from Population-Genomic Sequencing Data |
title_full_unstemmed | Genotype Calling from Population-Genomic Sequencing Data |
title_short | Genotype Calling from Population-Genomic Sequencing Data |
title_sort | genotype calling from population-genomic sequencing data |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5427492/ https://www.ncbi.nlm.nih.gov/pubmed/28108551 http://dx.doi.org/10.1534/g3.117.039008 |
work_keys_str_mv | AT marukitakahiro genotypecallingfrompopulationgenomicsequencingdata AT lynchmichael genotypecallingfrompopulationgenomicsequencingdata |