Cargando…

Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition

Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can b...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Jaron, Johansen, Renee, Dunbar, John, Munsky, Brian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602172/
https://www.ncbi.nlm.nih.gov/pubmed/31260460
http://dx.doi.org/10.1371/journal.pone.0215502
_version_ 1783431346164596736
author Thompson, Jaron
Johansen, Renee
Dunbar, John
Munsky, Brian
author_facet Thompson, Jaron
Johansen, Renee
Dunbar, John
Munsky, Brian
author_sort Thompson, Jaron
collection PubMed
description Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest.
format Online
Article
Text
id pubmed-6602172
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-66021722019-07-12 Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition Thompson, Jaron Johansen, Renee Dunbar, John Munsky, Brian PLoS One Research Article Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest. Public Library of Science 2019-07-01 /pmc/articles/PMC6602172/ /pubmed/31260460 http://dx.doi.org/10.1371/journal.pone.0215502 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Thompson, Jaron
Johansen, Renee
Dunbar, John
Munsky, Brian
Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
title Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
title_full Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
title_fullStr Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
title_full_unstemmed Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
title_short Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
title_sort machine learning to predict microbial community functions: an analysis of dissolved organic carbon from litter decomposition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602172/
https://www.ncbi.nlm.nih.gov/pubmed/31260460
http://dx.doi.org/10.1371/journal.pone.0215502
work_keys_str_mv AT thompsonjaron machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition
AT johansenrenee machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition
AT dunbarjohn machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition
AT munskybrian machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition