Cargando…

Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease

Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the n...

Descripción completa

Detalles Bibliográficos
Autores principales: Tataru, Christine A., David, Maude M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7244183/
https://www.ncbi.nlm.nih.gov/pubmed/32365061
http://dx.doi.org/10.1371/journal.pcbi.1007859
_version_ 1783537534774542336
author Tataru, Christine A.
David, Maude M.
author_facet Tataru, Christine A.
David, Maude M.
author_sort Tataru, Christine A.
collection PubMed
description Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis.
format Online
Article
Text
id pubmed-7244183
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-72441832020-06-05 Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease Tataru, Christine A. David, Maude M. PLoS Comput Biol Research Article Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis. Public Library of Science 2020-05-04 /pmc/articles/PMC7244183/ /pubmed/32365061 http://dx.doi.org/10.1371/journal.pcbi.1007859 Text en © 2020 Tataru, David http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tataru, Christine A.
David, Maude M.
Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
title Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
title_full Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
title_fullStr Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
title_full_unstemmed Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
title_short Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
title_sort decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7244183/
https://www.ncbi.nlm.nih.gov/pubmed/32365061
http://dx.doi.org/10.1371/journal.pcbi.1007859
work_keys_str_mv AT tataruchristinea decodingthelanguageofmicrobiomesusingwordembeddingtechniquesandapplicationsininflammatoryboweldisease
AT davidmaudem decodingthelanguageofmicrobiomesusingwordembeddingtechniquesandapplicationsininflammatoryboweldisease