Cargando…
Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the n...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7244183/ https://www.ncbi.nlm.nih.gov/pubmed/32365061 http://dx.doi.org/10.1371/journal.pcbi.1007859 |
_version_ | 1783537534774542336 |
---|---|
author | Tataru, Christine A. David, Maude M. |
author_facet | Tataru, Christine A. David, Maude M. |
author_sort | Tataru, Christine A. |
collection | PubMed |
description | Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis. |
format | Online Article Text |
id | pubmed-7244183 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-72441832020-06-05 Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease Tataru, Christine A. David, Maude M. PLoS Comput Biol Research Article Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis. Public Library of Science 2020-05-04 /pmc/articles/PMC7244183/ /pubmed/32365061 http://dx.doi.org/10.1371/journal.pcbi.1007859 Text en © 2020 Tataru, David http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Tataru, Christine A. David, Maude M. Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
title | Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
title_full | Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
title_fullStr | Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
title_full_unstemmed | Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
title_short | Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
title_sort | decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7244183/ https://www.ncbi.nlm.nih.gov/pubmed/32365061 http://dx.doi.org/10.1371/journal.pcbi.1007859 |
work_keys_str_mv | AT tataruchristinea decodingthelanguageofmicrobiomesusingwordembeddingtechniquesandapplicationsininflammatoryboweldisease AT davidmaudem decodingthelanguageofmicrobiomesusingwordembeddingtechniquesandapplicationsininflammatoryboweldisease |