Cargando…
Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376673/ https://www.ncbi.nlm.nih.gov/pubmed/25815895 http://dx.doi.org/10.1371/journal.pcbi.1004186 |
_version_ | 1782363764655390720 |
---|---|
author | Albanese, Davide De Filippo, Carlotta Cavalieri, Duccio Donati, Claudio |
author_facet | Albanese, Davide De Filippo, Carlotta Cavalieri, Duccio Donati, Claudio |
author_sort | Albanese, Davide |
collection | PubMed |
description | Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information. |
format | Online Article Text |
id | pubmed-4376673 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-43766732015-04-04 Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting Albanese, Davide De Filippo, Carlotta Cavalieri, Duccio Donati, Claudio PLoS Comput Biol Research Article Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information. Public Library of Science 2015-03-27 /pmc/articles/PMC4376673/ /pubmed/25815895 http://dx.doi.org/10.1371/journal.pcbi.1004186 Text en © 2015 Albanese et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Albanese, Davide De Filippo, Carlotta Cavalieri, Duccio Donati, Claudio Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting |
title | Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting |
title_full | Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting |
title_fullStr | Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting |
title_full_unstemmed | Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting |
title_short | Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting |
title_sort | explaining diversity in metagenomic datasets by phylogenetic-based feature weighting |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376673/ https://www.ncbi.nlm.nih.gov/pubmed/25815895 http://dx.doi.org/10.1371/journal.pcbi.1004186 |
work_keys_str_mv | AT albanesedavide explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting AT defilippocarlotta explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting AT cavalieriduccio explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting AT donaticlaudio explainingdiversityinmetagenomicdatasetsbyphylogeneticbasedfeatureweighting |