Cargando…
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
BACKGROUND: Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computa...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098090/ https://www.ncbi.nlm.nih.gov/pubmed/21034504 http://dx.doi.org/10.1186/1471-2105-11-538 |
_version_ | 1782203915173888000 |
---|---|
author | Matsen, Frederick A Kodner, Robin B Armbrust, E Virginia |
author_facet | Matsen, Frederick A Kodner, Robin B Armbrust, E Virginia |
author_sort | Matsen, Frederick A |
collection | PubMed |
description | BACKGROUND: Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. RESULTS: This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. CONCLUSIONS: Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. |
format | Text |
id | pubmed-3098090 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30980902011-07-08 pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree Matsen, Frederick A Kodner, Robin B Armbrust, E Virginia BMC Bioinformatics Methodology Article BACKGROUND: Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. RESULTS: This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. CONCLUSIONS: Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. BioMed Central 2010-10-30 /pmc/articles/PMC3098090/ /pubmed/21034504 http://dx.doi.org/10.1186/1471-2105-11-538 Text en Copyright ©2010 Matsen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Matsen, Frederick A Kodner, Robin B Armbrust, E Virginia pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree |
title | pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree |
title_full | pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree |
title_fullStr | pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree |
title_full_unstemmed | pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree |
title_short | pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree |
title_sort | pplacer: linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098090/ https://www.ncbi.nlm.nih.gov/pubmed/21034504 http://dx.doi.org/10.1186/1471-2105-11-538 |
work_keys_str_mv | AT matsenfredericka pplacerlineartimemaximumlikelihoodandbayesianphylogeneticplacementofsequencesontoafixedreferencetree AT kodnerrobinb pplacerlineartimemaximumlikelihoodandbayesianphylogeneticplacementofsequencesontoafixedreferencetree AT armbrustevirginia pplacerlineartimemaximumlikelihoodandbayesianphylogeneticplacementofsequencesontoafixedreferencetree |