Cargando…

Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting

Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical...

Descripción completa

Detalles Bibliográficos
Autores principales: Albanese, Davide, De Filippo, Carlotta, Cavalieri, Duccio, Donati, Claudio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376673/
https://www.ncbi.nlm.nih.gov/pubmed/25815895
http://dx.doi.org/10.1371/journal.pcbi.1004186
_version_ 1782363764655390720
author Albanese, Davide
De Filippo, Carlotta
Cavalieri, Duccio
Donati, Claudio
author_facet Albanese, Davide
De Filippo, Carlotta
Cavalieri, Duccio
Donati, Claudio
author_sort Albanese, Davide
collection PubMed
description Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.
format Online
Article
Text
id pubmed-4376673
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43766732015-04-04 Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting Albanese, Davide De Filippo, Carlotta Cavalieri, Duccio Donati, Claudio PLoS Comput Biol Research Article Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information. Public Library of Science 2015-03-27 /pmc/articles/PMC4376673/ /pubmed/25815895 http://dx.doi.org/10.1371/journal.pcbi.1004186 Text en © 2015 Albanese et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Albanese, Davide
De Filippo, Carlotta
Cavalieri, Duccio
Donati, Claudio
Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
title Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
title_full Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
title_fullStr Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
title_full_unstemmed Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
title_short Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
title_sort explaining diversity in metagenomic datasets by phylogenetic-based feature weighting
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376673/
https://www.ncbi.nlm.nih.gov/pubmed/25815895
http://dx.doi.org/10.1371/journal.pcbi.1004186
work_keys_str_mv AT albanesedavide explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting
AT defilippocarlotta explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting
AT cavalieriduccio explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting
AT donaticlaudio explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting