Cargando…

Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples

BACKGROUND: The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given...

Descripción completa

Detalles Bibliográficos
Autores principales: Czech, Lucas, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6538146/
https://www.ncbi.nlm.nih.gov/pubmed/31136592
http://dx.doi.org/10.1371/journal.pone.0217050
_version_ 1783422138369179648
author Czech, Lucas
Stamatakis, Alexandros
author_facet Czech, Lucas
Stamatakis, Alexandros
author_sort Czech, Lucas
collection PubMed
description BACKGROUND: The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. NOVEL METHODS: Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for (a) visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, (b) clustering similar samples using a variant of the k-means method, and (c) finding phylogenetic factors using an adaptation of the Phylofactorization method. These methods enable to interpret metagenomic data in a phylogenetic context, to find patterns in the data, and to identify branches of the phylogeny that are driving these patterns. RESULTS: To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods.
format Online
Article
Text
id pubmed-6538146
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65381462019-06-05 Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples Czech, Lucas Stamatakis, Alexandros PLoS One Research Article BACKGROUND: The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. NOVEL METHODS: Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for (a) visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, (b) clustering similar samples using a variant of the k-means method, and (c) finding phylogenetic factors using an adaptation of the Phylofactorization method. These methods enable to interpret metagenomic data in a phylogenetic context, to find patterns in the data, and to identify branches of the phylogeny that are driving these patterns. RESULTS: To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods. Public Library of Science 2019-05-28 /pmc/articles/PMC6538146/ /pubmed/31136592 http://dx.doi.org/10.1371/journal.pone.0217050 Text en © 2019 Czech, Stamatakis http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Czech, Lucas
Stamatakis, Alexandros
Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
title Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
title_full Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
title_fullStr Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
title_full_unstemmed Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
title_short Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
title_sort scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6538146/
https://www.ncbi.nlm.nih.gov/pubmed/31136592
http://dx.doi.org/10.1371/journal.pone.0217050
work_keys_str_mv AT czechlucas scalablemethodsforanalyzingandvisualizingphylogeneticplacementofmetagenomicsamples
AT stamatakisalexandros scalablemethodsforanalyzingandvisualizingphylogeneticplacementofmetagenomicsamples