Cargando…
Population-Genetic Inference from Pooled-Sequencing Data
Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for erro...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4040993/ https://www.ncbi.nlm.nih.gov/pubmed/24787620 http://dx.doi.org/10.1093/gbe/evu085 |
_version_ | 1782318619843100672 |
---|---|
author | Lynch, Michael Bost, Darius Wilson, Sade Maruki, Takahiro Harrison, Scott |
author_facet | Lynch, Michael Bost, Darius Wilson, Sade Maruki, Takahiro Harrison, Scott |
author_sort | Lynch, Michael |
collection | PubMed |
description | Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for errors associated with sequencing, and a likelihood-ratio test statistic that provides a simple means for evaluating the null hypothesis of monomorphism. Unbiased estimates of allele frequencies [Image: see text] (where N is the number of individuals sampled) appear to be unachievable, and near-certain identification of a polymorphism requires a minor-allele frequency [Image: see text]. A framework is provided for testing for significant differences in allele frequencies between populations, taking into account sampling at the levels of individuals within populations and sequences within pooled samples. Analyses that fail to account for the two tiers of sampling suffer from very large false-positive rates and can become increasingly misleading with increasing depths of sequence coverage. The power to detect significant allele-frequency differences between two populations is very limited unless both the number of sampled individuals and depth of sequencing coverage exceed 100. |
format | Online Article Text |
id | pubmed-4040993 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-40409932014-06-02 Population-Genetic Inference from Pooled-Sequencing Data Lynch, Michael Bost, Darius Wilson, Sade Maruki, Takahiro Harrison, Scott Genome Biol Evol Research Article Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for errors associated with sequencing, and a likelihood-ratio test statistic that provides a simple means for evaluating the null hypothesis of monomorphism. Unbiased estimates of allele frequencies [Image: see text] (where N is the number of individuals sampled) appear to be unachievable, and near-certain identification of a polymorphism requires a minor-allele frequency [Image: see text]. A framework is provided for testing for significant differences in allele frequencies between populations, taking into account sampling at the levels of individuals within populations and sequences within pooled samples. Analyses that fail to account for the two tiers of sampling suffer from very large false-positive rates and can become increasingly misleading with increasing depths of sequence coverage. The power to detect significant allele-frequency differences between two populations is very limited unless both the number of sampled individuals and depth of sequencing coverage exceed 100. Oxford University Press 2014-04-30 /pmc/articles/PMC4040993/ /pubmed/24787620 http://dx.doi.org/10.1093/gbe/evu085 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Lynch, Michael Bost, Darius Wilson, Sade Maruki, Takahiro Harrison, Scott Population-Genetic Inference from Pooled-Sequencing Data |
title | Population-Genetic Inference from Pooled-Sequencing Data |
title_full | Population-Genetic Inference from Pooled-Sequencing Data |
title_fullStr | Population-Genetic Inference from Pooled-Sequencing Data |
title_full_unstemmed | Population-Genetic Inference from Pooled-Sequencing Data |
title_short | Population-Genetic Inference from Pooled-Sequencing Data |
title_sort | population-genetic inference from pooled-sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4040993/ https://www.ncbi.nlm.nih.gov/pubmed/24787620 http://dx.doi.org/10.1093/gbe/evu085 |
work_keys_str_mv | AT lynchmichael populationgeneticinferencefrompooledsequencingdata AT bostdarius populationgeneticinferencefrompooledsequencingdata AT wilsonsade populationgeneticinferencefrompooledsequencingdata AT marukitakahiro populationgeneticinferencefrompooledsequencingdata AT harrisonscott populationgeneticinferencefrompooledsequencingdata |