Cargando…

Methods for automatic reference trees and multilevel phylogenetic placement

MOTIVATION: In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phy...

Descripción completa

Detalles Bibliográficos
Autores principales: Czech, Lucas, Barbera, Pierre, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449752/
https://www.ncbi.nlm.nih.gov/pubmed/30169747
http://dx.doi.org/10.1093/bioinformatics/bty767
_version_ 1783408916243152896
author Czech, Lucas
Barbera, Pierre
Stamatakis, Alexandros
author_facet Czech, Lucas
Barbera, Pierre
Stamatakis, Alexandros
author_sort Czech, Lucas
collection PubMed
description MOTIVATION: In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results. RESULTS: We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results. AVAILABILITY AND IMPLEMENTATION: Freely available under GPLv3 at http://github.com/lczech/gappa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6449752
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64497522019-04-09 Methods for automatic reference trees and multilevel phylogenetic placement Czech, Lucas Barbera, Pierre Stamatakis, Alexandros Bioinformatics Original Papers MOTIVATION: In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results. RESULTS: We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results. AVAILABILITY AND IMPLEMENTATION: Freely available under GPLv3 at http://github.com/lczech/gappa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-01 2018-08-31 /pmc/articles/PMC6449752/ /pubmed/30169747 http://dx.doi.org/10.1093/bioinformatics/bty767 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Czech, Lucas
Barbera, Pierre
Stamatakis, Alexandros
Methods for automatic reference trees and multilevel phylogenetic placement
title Methods for automatic reference trees and multilevel phylogenetic placement
title_full Methods for automatic reference trees and multilevel phylogenetic placement
title_fullStr Methods for automatic reference trees and multilevel phylogenetic placement
title_full_unstemmed Methods for automatic reference trees and multilevel phylogenetic placement
title_short Methods for automatic reference trees and multilevel phylogenetic placement
title_sort methods for automatic reference trees and multilevel phylogenetic placement
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449752/
https://www.ncbi.nlm.nih.gov/pubmed/30169747
http://dx.doi.org/10.1093/bioinformatics/bty767
work_keys_str_mv AT czechlucas methodsforautomaticreferencetreesandmultilevelphylogeneticplacement
AT barberapierre methodsforautomaticreferencetreesandmultilevelphylogeneticplacement
AT stamatakisalexandros methodsforautomaticreferencetreesandmultilevelphylogeneticplacement