Cargando…
Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition
Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can b...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602172/ https://www.ncbi.nlm.nih.gov/pubmed/31260460 http://dx.doi.org/10.1371/journal.pone.0215502 |
_version_ | 1783431346164596736 |
---|---|
author | Thompson, Jaron Johansen, Renee Dunbar, John Munsky, Brian |
author_facet | Thompson, Jaron Johansen, Renee Dunbar, John Munsky, Brian |
author_sort | Thompson, Jaron |
collection | PubMed |
description | Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest. |
format | Online Article Text |
id | pubmed-6602172 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-66021722019-07-12 Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition Thompson, Jaron Johansen, Renee Dunbar, John Munsky, Brian PLoS One Research Article Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest. Public Library of Science 2019-07-01 /pmc/articles/PMC6602172/ /pubmed/31260460 http://dx.doi.org/10.1371/journal.pone.0215502 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Thompson, Jaron Johansen, Renee Dunbar, John Munsky, Brian Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition |
title | Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition |
title_full | Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition |
title_fullStr | Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition |
title_full_unstemmed | Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition |
title_short | Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition |
title_sort | machine learning to predict microbial community functions: an analysis of dissolved organic carbon from litter decomposition |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602172/ https://www.ncbi.nlm.nih.gov/pubmed/31260460 http://dx.doi.org/10.1371/journal.pone.0215502 |
work_keys_str_mv | AT thompsonjaron machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition AT johansenrenee machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition AT dunbarjohn machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition AT munskybrian machinelearningtopredictmicrobialcommunityfunctionsananalysisofdissolvedorganiccarbonfromlitterdecomposition |