Cargando…

Fecal source identification using random forest

BACKGROUND: Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources...

Descripción completa

Detalles Bibliográficos
Autores principales: Roguet, Adélaïde, Eren, A. Murat, Newton, Ryan J, McLellan, Sandra L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6194674/
https://www.ncbi.nlm.nih.gov/pubmed/30336775
http://dx.doi.org/10.1186/s40168-018-0568-3
_version_ 1783364272604053504
author Roguet, Adélaïde
Eren, A. Murat
Newton, Ryan J
McLellan, Sandra L
author_facet Roguet, Adélaïde
Eren, A. Murat
Newton, Ryan J
McLellan, Sandra L
author_sort Roguet, Adélaïde
collection PubMed
description BACKGROUND: Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources of fecal contamination in environmental samples. However, a comprehensive evaluation of their predictive power and development of computational approaches is lacking. Given the global public health concern for waterborne disease, accurate identification of fecal pollution sources is essential for effective risk assessment and management. Here, we use random forest algorithm and 16S rRNA gene amplicon sequences assigned to Clostridiales and Bacteroidales to identify common fecal pollution sources. We benchmarked the accuracy, consistency, and sensitivity of our classification approach using fecal, environmental, and artificial in silico generated samples. RESULTS: Clostridiales and Bacteroidales classifiers were composed mainly of sequences that displayed differential distributions (host-preferred) among sewage, cow, deer, pig, cat, and dog sources. Each classifier correctly identified human and individual animal sources in approximately 90% of the fecal and environmental samples tested. Misclassifications resulted mostly from false-positive identification of cat and dog fecal signatures in host animals not used to build the classifiers, suggesting characterization of additional animals would improve accuracy. Random forest predictions were highly reproducible, reflecting the consistency of the bacterial signatures within each of the animal and sewage sources. Using in silico generated samples, we could detect fecal bacterial signatures when the source dataset accounted for as little as ~ 0.5% of the assemblage, with ~ 0.04% of the sequences matching the classifiers. Finally, we developed a proxy to estimate proportions among sources, which allowed us to determine which sources contribute the most to observed fecal pollution. CONCLUSION: Random forest classification with 16S rRNA gene amplicons offers a rapid, sensitive, and accurate solution for identifying host microbial signatures to detect human and animal fecal contamination in environmental samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0568-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6194674
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61946742018-10-25 Fecal source identification using random forest Roguet, Adélaïde Eren, A. Murat Newton, Ryan J McLellan, Sandra L Microbiome Research BACKGROUND: Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources of fecal contamination in environmental samples. However, a comprehensive evaluation of their predictive power and development of computational approaches is lacking. Given the global public health concern for waterborne disease, accurate identification of fecal pollution sources is essential for effective risk assessment and management. Here, we use random forest algorithm and 16S rRNA gene amplicon sequences assigned to Clostridiales and Bacteroidales to identify common fecal pollution sources. We benchmarked the accuracy, consistency, and sensitivity of our classification approach using fecal, environmental, and artificial in silico generated samples. RESULTS: Clostridiales and Bacteroidales classifiers were composed mainly of sequences that displayed differential distributions (host-preferred) among sewage, cow, deer, pig, cat, and dog sources. Each classifier correctly identified human and individual animal sources in approximately 90% of the fecal and environmental samples tested. Misclassifications resulted mostly from false-positive identification of cat and dog fecal signatures in host animals not used to build the classifiers, suggesting characterization of additional animals would improve accuracy. Random forest predictions were highly reproducible, reflecting the consistency of the bacterial signatures within each of the animal and sewage sources. Using in silico generated samples, we could detect fecal bacterial signatures when the source dataset accounted for as little as ~ 0.5% of the assemblage, with ~ 0.04% of the sequences matching the classifiers. Finally, we developed a proxy to estimate proportions among sources, which allowed us to determine which sources contribute the most to observed fecal pollution. CONCLUSION: Random forest classification with 16S rRNA gene amplicons offers a rapid, sensitive, and accurate solution for identifying host microbial signatures to detect human and animal fecal contamination in environmental samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0568-3) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-18 /pmc/articles/PMC6194674/ /pubmed/30336775 http://dx.doi.org/10.1186/s40168-018-0568-3 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Roguet, Adélaïde
Eren, A. Murat
Newton, Ryan J
McLellan, Sandra L
Fecal source identification using random forest
title Fecal source identification using random forest
title_full Fecal source identification using random forest
title_fullStr Fecal source identification using random forest
title_full_unstemmed Fecal source identification using random forest
title_short Fecal source identification using random forest
title_sort fecal source identification using random forest
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6194674/
https://www.ncbi.nlm.nih.gov/pubmed/30336775
http://dx.doi.org/10.1186/s40168-018-0568-3
work_keys_str_mv AT roguetadelaide fecalsourceidentificationusingrandomforest
AT erenamurat fecalsourceidentificationusingrandomforest
AT newtonryanj fecalsourceidentificationusingrandomforest
AT mclellansandral fecalsourceidentificationusingrandomforest