Cargando…

Microbiome-based disease prediction with multimodal variational information bottlenecks

Scientific research is shedding light on the interaction of the gut microbiome with the human host and on its role in human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. Most of them leverage shotgun metagenomic sequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Grazioli, Filippo, Siarheyeu, Raman, Alqassem, Israa, Henschel, Andreas, Pileggi, Giampaolo, Meiser, Andrea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9022840/
https://www.ncbi.nlm.nih.gov/pubmed/35404958
http://dx.doi.org/10.1371/journal.pcbi.1010050
_version_ 1784690187492655104
author Grazioli, Filippo
Siarheyeu, Raman
Alqassem, Israa
Henschel, Andreas
Pileggi, Giampaolo
Meiser, Andrea
author_facet Grazioli, Filippo
Siarheyeu, Raman
Alqassem, Israa
Henschel, Andreas
Pileggi, Giampaolo
Meiser, Andrea
author_sort Grazioli, Filippo
collection PubMed
description Scientific research is shedding light on the interaction of the gut microbiome with the human host and on its role in human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. Most of them leverage shotgun metagenomic sequencing to extract gut microbial species-relative abundances or strain-level markers. Each of these gut microbial profiling modalities showed diagnostic potential when tested separately; however, no existing approach combines them in a single predictive framework. Here, we propose the Multimodal Variational Information Bottleneck (MVIB), a novel deep learning model capable of learning a joint representation of multiple heterogeneous data modalities. MVIB achieves competitive classification performance while being faster than existing methods. Additionally, MVIB offers interpretable results. Our model adopts an information theoretic interpretation of deep neural networks and computes a joint stochastic encoding of different input data modalities. We use MVIB to predict whether human hosts are affected by a certain disease by jointly analysing gut microbial species-relative abundances and strain-level markers. MVIB is evaluated on human gut metagenomic samples from 11 publicly available disease cohorts covering 6 different diseases. We achieve high performance (0.80 < ROC AUC < 0.95) on 5 cohorts and at least medium performance on the remaining ones. We adopt a saliency technique to interpret the output of MVIB and identify the most relevant microbial species and strain-level markers to the model’s predictions. We also perform cross-study generalisation experiments, where we train and test MVIB on different cohorts of the same disease, and overall we achieve comparable results to the baseline approach, i.e. the Random Forest. Further, we evaluate our model by adding metabolomic data derived from mass spectrometry as a third input modality. Our method is scalable with respect to input data modalities and has an average training time of < 1.4 seconds. The source code and the datasets used in this work are publicly available.
format Online
Article
Text
id pubmed-9022840
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90228402022-04-22 Microbiome-based disease prediction with multimodal variational information bottlenecks Grazioli, Filippo Siarheyeu, Raman Alqassem, Israa Henschel, Andreas Pileggi, Giampaolo Meiser, Andrea PLoS Comput Biol Research Article Scientific research is shedding light on the interaction of the gut microbiome with the human host and on its role in human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. Most of them leverage shotgun metagenomic sequencing to extract gut microbial species-relative abundances or strain-level markers. Each of these gut microbial profiling modalities showed diagnostic potential when tested separately; however, no existing approach combines them in a single predictive framework. Here, we propose the Multimodal Variational Information Bottleneck (MVIB), a novel deep learning model capable of learning a joint representation of multiple heterogeneous data modalities. MVIB achieves competitive classification performance while being faster than existing methods. Additionally, MVIB offers interpretable results. Our model adopts an information theoretic interpretation of deep neural networks and computes a joint stochastic encoding of different input data modalities. We use MVIB to predict whether human hosts are affected by a certain disease by jointly analysing gut microbial species-relative abundances and strain-level markers. MVIB is evaluated on human gut metagenomic samples from 11 publicly available disease cohorts covering 6 different diseases. We achieve high performance (0.80 < ROC AUC < 0.95) on 5 cohorts and at least medium performance on the remaining ones. We adopt a saliency technique to interpret the output of MVIB and identify the most relevant microbial species and strain-level markers to the model’s predictions. We also perform cross-study generalisation experiments, where we train and test MVIB on different cohorts of the same disease, and overall we achieve comparable results to the baseline approach, i.e. the Random Forest. Further, we evaluate our model by adding metabolomic data derived from mass spectrometry as a third input modality. Our method is scalable with respect to input data modalities and has an average training time of < 1.4 seconds. The source code and the datasets used in this work are publicly available. Public Library of Science 2022-04-11 /pmc/articles/PMC9022840/ /pubmed/35404958 http://dx.doi.org/10.1371/journal.pcbi.1010050 Text en © 2022 Grazioli et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Grazioli, Filippo
Siarheyeu, Raman
Alqassem, Israa
Henschel, Andreas
Pileggi, Giampaolo
Meiser, Andrea
Microbiome-based disease prediction with multimodal variational information bottlenecks
title Microbiome-based disease prediction with multimodal variational information bottlenecks
title_full Microbiome-based disease prediction with multimodal variational information bottlenecks
title_fullStr Microbiome-based disease prediction with multimodal variational information bottlenecks
title_full_unstemmed Microbiome-based disease prediction with multimodal variational information bottlenecks
title_short Microbiome-based disease prediction with multimodal variational information bottlenecks
title_sort microbiome-based disease prediction with multimodal variational information bottlenecks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9022840/
https://www.ncbi.nlm.nih.gov/pubmed/35404958
http://dx.doi.org/10.1371/journal.pcbi.1010050
work_keys_str_mv AT graziolifilippo microbiomebaseddiseasepredictionwithmultimodalvariationalinformationbottlenecks
AT siarheyeuraman microbiomebaseddiseasepredictionwithmultimodalvariationalinformationbottlenecks
AT alqassemisraa microbiomebaseddiseasepredictionwithmultimodalvariationalinformationbottlenecks
AT henschelandreas microbiomebaseddiseasepredictionwithmultimodalvariationalinformationbottlenecks
AT pileggigiampaolo microbiomebaseddiseasepredictionwithmultimodalvariationalinformationbottlenecks
AT meiserandrea microbiomebaseddiseasepredictionwithmultimodalvariationalinformationbottlenecks