Cargando…

From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

BACKGROUND: Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification...

Descripción completa

Detalles Bibliográficos
Autores principales: Slabbinck, Bram, Waegeman, Willem, Dawyndt, Peter, De Vos, Paul, De Baets, Bernard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2828439/
https://www.ncbi.nlm.nih.gov/pubmed/20113515
http://dx.doi.org/10.1186/1471-2105-11-69
_version_ 1782178008752193536
author Slabbinck, Bram
Waegeman, Willem
Dawyndt, Peter
De Vos, Paul
De Baets, Bernard
author_facet Slabbinck, Bram
Waegeman, Willem
Dawyndt, Peter
De Vos, Paul
De Baets, Bernard
author_sort Slabbinck, Bram
collection PubMed
description BACKGROUND: Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. RESULTS: In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. CONCLUSIONS: FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
format Text
id pubmed-2828439
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28284392010-02-25 From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification Slabbinck, Bram Waegeman, Willem Dawyndt, Peter De Vos, Paul De Baets, Bernard BMC Bioinformatics Research Article BACKGROUND: Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. RESULTS: In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. CONCLUSIONS: FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. BioMed Central 2010-01-30 /pmc/articles/PMC2828439/ /pubmed/20113515 http://dx.doi.org/10.1186/1471-2105-11-69 Text en Copyright ©2010 Slabbinck et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Slabbinck, Bram
Waegeman, Willem
Dawyndt, Peter
De Vos, Paul
De Baets, Bernard
From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
title From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
title_full From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
title_fullStr From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
title_full_unstemmed From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
title_short From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
title_sort from learning taxonomies to phylogenetic learning: integration of 16s rrna gene data into fame-based bacterial classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2828439/
https://www.ncbi.nlm.nih.gov/pubmed/20113515
http://dx.doi.org/10.1186/1471-2105-11-69
work_keys_str_mv AT slabbinckbram fromlearningtaxonomiestophylogeneticlearningintegrationof16srrnagenedataintofamebasedbacterialclassification
AT waegemanwillem fromlearningtaxonomiestophylogeneticlearningintegrationof16srrnagenedataintofamebasedbacterialclassification
AT dawyndtpeter fromlearningtaxonomiestophylogeneticlearningintegrationof16srrnagenedataintofamebasedbacterialclassification
AT devospaul fromlearningtaxonomiestophylogeneticlearningintegrationof16srrnagenedataintofamebasedbacterialclassification
AT debaetsbernard fromlearningtaxonomiestophylogeneticlearningintegrationof16srrnagenedataintofamebasedbacterialclassification