Cargando…

Taxonomy-aware feature engineering for microbiome classification

BACKGROUND: What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather...

Descripción completa

Detalles Bibliográficos
Autores principales: Oudah, Mai, Henschel, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6003080/
https://www.ncbi.nlm.nih.gov/pubmed/29907097
http://dx.doi.org/10.1186/s12859-018-2205-3
_version_ 1783332301010108416
author Oudah, Mai
Henschel, Andreas
author_facet Oudah, Mai
Henschel, Andreas
author_sort Oudah, Mai
collection PubMed
description BACKGROUND: What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. RESULTS: We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. CONCLUSION: We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2205-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6003080
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60030802018-07-06 Taxonomy-aware feature engineering for microbiome classification Oudah, Mai Henschel, Andreas BMC Bioinformatics Methodology Article BACKGROUND: What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. RESULTS: We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. CONCLUSION: We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2205-3) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-15 /pmc/articles/PMC6003080/ /pubmed/29907097 http://dx.doi.org/10.1186/s12859-018-2205-3 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Oudah, Mai
Henschel, Andreas
Taxonomy-aware feature engineering for microbiome classification
title Taxonomy-aware feature engineering for microbiome classification
title_full Taxonomy-aware feature engineering for microbiome classification
title_fullStr Taxonomy-aware feature engineering for microbiome classification
title_full_unstemmed Taxonomy-aware feature engineering for microbiome classification
title_short Taxonomy-aware feature engineering for microbiome classification
title_sort taxonomy-aware feature engineering for microbiome classification
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6003080/
https://www.ncbi.nlm.nih.gov/pubmed/29907097
http://dx.doi.org/10.1186/s12859-018-2205-3
work_keys_str_mv AT oudahmai taxonomyawarefeatureengineeringformicrobiomeclassification
AT henschelandreas taxonomyawarefeatureengineeringformicrobiomeclassification