Cargando…

Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites

BACKGROUND: The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately pred...

Descripción completa

Detalles Bibliográficos
Autores principales: Tackmann, Janko, Arora, Natasha, Schmidt, Thomas Sebastian Benedikt, Rodrigues, João Frederico Matias, von Mering, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6201589/
https://www.ncbi.nlm.nih.gov/pubmed/30355348
http://dx.doi.org/10.1186/s40168-018-0565-6
_version_ 1783365537047248896
author Tackmann, Janko
Arora, Natasha
Schmidt, Thomas Sebastian Benedikt
Rodrigues, João Frederico Matias
von Mering, Christian
author_facet Tackmann, Janko
Arora, Natasha
Schmidt, Thomas Sebastian Benedikt
Rodrigues, João Frederico Matias
von Mering, Christian
author_sort Tackmann, Janko
collection PubMed
description BACKGROUND: The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately predicted by microbial content. However, these studies were usually restricted to single datasets with consistent experimental methods and conditions, as well as comparatively small sample numbers. The effects of study-specific biases and statistical power on classification performance and biomarker identification thus remain poorly understood. Furthermore, reliable detection in mixtures of different body sites or with noise from environmental contamination has rarely been investigated thus far. Finally, the impact of ecological associations between microbes on biomarker discovery was usually not considered in previous work. RESULTS: Here we present the analysis of one of the largest cross-study sequencing datasets of microbial communities from human body sites (15,082 samples from 57 publicly available studies). We show that training a Random Forest Classifier on this aggregated dataset increases prediction performance for body sites by 35% compared to a single-study classifier. Using simulated datasets, we further demonstrate that the source of different microbial contributions in mixtures of different body sites or with soil can be detected starting at 1% of the total microbial community. We apply a biomarker selection method that excludes indirect environmental associations driven by microbe-microbe associations, yielding a parsimonious set of highly predictive taxa including novel biomarkers and excluding many previously reported taxa. We find a considerable fraction of unclassified biomarkers (“microbial dark matter”) and observe that negatively associated taxa have a surprisingly high impact on classification performance. We further detect a significant enrichment of rod-shaped, motile, and sporulating taxa for feces biomarkers, consistent with a highly competitive environment. CONCLUSIONS: Our machine learning model shows strong body site classification performance, both in single-source samples and mixtures, making it promising for tasks requiring high accuracy, such as forensic applications. We report a core set of ecologically informed biomarkers, inferred across a wide range of experimental protocols and conditions, providing the most concise, general, and least biased overview of body site-associated microbes to date. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0565-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6201589
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62015892018-10-31 Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites Tackmann, Janko Arora, Natasha Schmidt, Thomas Sebastian Benedikt Rodrigues, João Frederico Matias von Mering, Christian Microbiome Research BACKGROUND: The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately predicted by microbial content. However, these studies were usually restricted to single datasets with consistent experimental methods and conditions, as well as comparatively small sample numbers. The effects of study-specific biases and statistical power on classification performance and biomarker identification thus remain poorly understood. Furthermore, reliable detection in mixtures of different body sites or with noise from environmental contamination has rarely been investigated thus far. Finally, the impact of ecological associations between microbes on biomarker discovery was usually not considered in previous work. RESULTS: Here we present the analysis of one of the largest cross-study sequencing datasets of microbial communities from human body sites (15,082 samples from 57 publicly available studies). We show that training a Random Forest Classifier on this aggregated dataset increases prediction performance for body sites by 35% compared to a single-study classifier. Using simulated datasets, we further demonstrate that the source of different microbial contributions in mixtures of different body sites or with soil can be detected starting at 1% of the total microbial community. We apply a biomarker selection method that excludes indirect environmental associations driven by microbe-microbe associations, yielding a parsimonious set of highly predictive taxa including novel biomarkers and excluding many previously reported taxa. We find a considerable fraction of unclassified biomarkers (“microbial dark matter”) and observe that negatively associated taxa have a surprisingly high impact on classification performance. We further detect a significant enrichment of rod-shaped, motile, and sporulating taxa for feces biomarkers, consistent with a highly competitive environment. CONCLUSIONS: Our machine learning model shows strong body site classification performance, both in single-source samples and mixtures, making it promising for tasks requiring high accuracy, such as forensic applications. We report a core set of ecologically informed biomarkers, inferred across a wide range of experimental protocols and conditions, providing the most concise, general, and least biased overview of body site-associated microbes to date. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0565-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-24 /pmc/articles/PMC6201589/ /pubmed/30355348 http://dx.doi.org/10.1186/s40168-018-0565-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Tackmann, Janko
Arora, Natasha
Schmidt, Thomas Sebastian Benedikt
Rodrigues, João Frederico Matias
von Mering, Christian
Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
title Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
title_full Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
title_fullStr Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
title_full_unstemmed Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
title_short Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
title_sort ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6201589/
https://www.ncbi.nlm.nih.gov/pubmed/30355348
http://dx.doi.org/10.1186/s40168-018-0565-6
work_keys_str_mv AT tackmannjanko ecologicallyinformedmicrobialbiomarkersandaccurateclassificationofmixedandunmixedsamplesinanextensivecrossstudyofhumanbodysites
AT aroranatasha ecologicallyinformedmicrobialbiomarkersandaccurateclassificationofmixedandunmixedsamplesinanextensivecrossstudyofhumanbodysites
AT schmidtthomassebastianbenedikt ecologicallyinformedmicrobialbiomarkersandaccurateclassificationofmixedandunmixedsamplesinanextensivecrossstudyofhumanbodysites
AT rodriguesjoaofredericomatias ecologicallyinformedmicrobialbiomarkersandaccurateclassificationofmixedandunmixedsamplesinanextensivecrossstudyofhumanbodysites
AT vonmeringchristian ecologicallyinformedmicrobialbiomarkersandaccurateclassificationofmixedandunmixedsamplesinanextensivecrossstudyofhumanbodysites