Cargando…

IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data

MOTIVATION: The human body hosts more microbial organisms than human cells. Analysis of this microbial diversity provides key insight into the role played by these microorganisms on human health. Metagenomics is the collective DNA sequencing of coexisting microbial organisms in an environmental samp...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahman, Mohammad Arifur, Rangwala, Huzefa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355246/
https://www.ncbi.nlm.nih.gov/pubmed/32657370
http://dx.doi.org/10.1093/bioinformatics/btaa477
_version_ 1783558235996815360
author Rahman, Mohammad Arifur
Rangwala, Huzefa
author_facet Rahman, Mohammad Arifur
Rangwala, Huzefa
author_sort Rahman, Mohammad Arifur
collection PubMed
description MOTIVATION: The human body hosts more microbial organisms than human cells. Analysis of this microbial diversity provides key insight into the role played by these microorganisms on human health. Metagenomics is the collective DNA sequencing of coexisting microbial organisms in an environmental sample or a host. This has several applications in precision medicine, agriculture, environmental science and forensics. State-of-the-art predictive models for phenotype predictions from metagenomic data rely on alignments, assembly, extensive pruning, taxonomic profiling and reference sequence databases. These processes are time consuming and they do not consider novel microbial sequences when aligned with the reference genome, limiting the potential of whole metagenomics. We formulate the problem of predicting human disease from whole-metagenomic data using Multiple Instance Learning (MIL), a popular supervised learning paradigm. Our proposed alignment-free approach provides higher accuracy in prediction by harnessing the capability of deep convolutional neural network (CNN) within a MIL framework and provides interpretability via neural attention mechanism. RESULTS: The MIL formulation combined with the hierarchical feature extraction capability of deep-CNN provides significantly better predictive performance compared to popular existing approaches. The attention mechanism allows for the identification of groups of sequences that are likely to be correlated to diseases providing the much-needed interpretation. Our proposed approach does not rely on alignment, assembly and reference sequence databases; making it fast and scalable for large-scale metagenomic data. We evaluate our method on well-known large-scale metagenomic studies and show that our proposed approach outperforms comparative state-of-the-art methods for disease prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/mrahma23/IDMIL.
format Online
Article
Text
id pubmed-7355246
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73552462020-07-16 IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data Rahman, Mohammad Arifur Rangwala, Huzefa Bioinformatics Bioinformatics of Microbes and Microbiomes MOTIVATION: The human body hosts more microbial organisms than human cells. Analysis of this microbial diversity provides key insight into the role played by these microorganisms on human health. Metagenomics is the collective DNA sequencing of coexisting microbial organisms in an environmental sample or a host. This has several applications in precision medicine, agriculture, environmental science and forensics. State-of-the-art predictive models for phenotype predictions from metagenomic data rely on alignments, assembly, extensive pruning, taxonomic profiling and reference sequence databases. These processes are time consuming and they do not consider novel microbial sequences when aligned with the reference genome, limiting the potential of whole metagenomics. We formulate the problem of predicting human disease from whole-metagenomic data using Multiple Instance Learning (MIL), a popular supervised learning paradigm. Our proposed alignment-free approach provides higher accuracy in prediction by harnessing the capability of deep convolutional neural network (CNN) within a MIL framework and provides interpretability via neural attention mechanism. RESULTS: The MIL formulation combined with the hierarchical feature extraction capability of deep-CNN provides significantly better predictive performance compared to popular existing approaches. The attention mechanism allows for the identification of groups of sequences that are likely to be correlated to diseases providing the much-needed interpretation. Our proposed approach does not rely on alignment, assembly and reference sequence databases; making it fast and scalable for large-scale metagenomic data. We evaluate our method on well-known large-scale metagenomic studies and show that our proposed approach outperforms comparative state-of-the-art methods for disease prediction. AVAILABILITY AND IMPLEMENTATION: https://github.com/mrahma23/IDMIL. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355246/ /pubmed/32657370 http://dx.doi.org/10.1093/bioinformatics/btaa477 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Bioinformatics of Microbes and Microbiomes
Rahman, Mohammad Arifur
Rangwala, Huzefa
IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data
title IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data
title_full IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data
title_fullStr IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data
title_full_unstemmed IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data
title_short IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data
title_sort idmil: an alignment-free interpretable deep multiple instance learning (mil) for predicting disease from whole-metagenomic data
topic Bioinformatics of Microbes and Microbiomes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355246/
https://www.ncbi.nlm.nih.gov/pubmed/32657370
http://dx.doi.org/10.1093/bioinformatics/btaa477
work_keys_str_mv AT rahmanmohammadarifur idmilanalignmentfreeinterpretabledeepmultipleinstancelearningmilforpredictingdiseasefromwholemetagenomicdata
AT rangwalahuzefa idmilanalignmentfreeinterpretabledeepmultipleinstancelearningmilforpredictingdiseasefromwholemetagenomicdata