Cargando…
Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data
DNA samples are often pooled, either by experimental design or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3670732/ https://www.ncbi.nlm.nih.gov/pubmed/23364324 http://dx.doi.org/10.1093/molbev/mst016 |
_version_ | 1782271885406371840 |
---|---|
author | Kessner, Darren Turner, Thomas L. Novembre, John |
author_facet | Kessner, Darren Turner, Thomas L. Novembre, John |
author_sort | Kessner, Darren |
collection | PubMed |
description | DNA samples are often pooled, either by experimental design or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g., bacterial species comprising a microbiome or pathogen strains in a blood sample). We present an expectation–maximization algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different species within a metagenomics sample. Our method outperforms existing methods based on single-site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool. |
format | Online Article Text |
id | pubmed-3670732 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-36707322013-06-03 Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data Kessner, Darren Turner, Thomas L. Novembre, John Mol Biol Evol Methods DNA samples are often pooled, either by experimental design or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g., bacterial species comprising a microbiome or pathogen strains in a blood sample). We present an expectation–maximization algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different species within a metagenomics sample. Our method outperforms existing methods based on single-site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool. Oxford University Press 2013-05 2013-01-30 /pmc/articles/PMC3670732/ /pubmed/23364324 http://dx.doi.org/10.1093/molbev/mst016 Text en © The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Kessner, Darren Turner, Thomas L. Novembre, John Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data |
title | Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data |
title_full | Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data |
title_fullStr | Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data |
title_full_unstemmed | Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data |
title_short | Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data |
title_sort | maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3670732/ https://www.ncbi.nlm.nih.gov/pubmed/23364324 http://dx.doi.org/10.1093/molbev/mst016 |
work_keys_str_mv | AT kessnerdarren maximumlikelihoodestimationoffrequenciesofknownhaplotypesfrompooledsequencedata AT turnerthomasl maximumlikelihoodestimationoffrequenciesofknownhaplotypesfrompooledsequencedata AT novembrejohn maximumlikelihoodestimationoffrequenciesofknownhaplotypesfrompooledsequencedata |