Cargando…

SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM

INTRODUCTION: Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious....

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Feixiang, Yang, Huandong, Wu, Yan, Peng, Lihong, Li, Xiaoling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10320730/
https://www.ncbi.nlm.nih.gov/pubmed/37415823
http://dx.doi.org/10.3389/fmicb.2023.1207209
_version_ 1785068497457381376
author Wang, Feixiang
Yang, Huandong
Wu, Yan
Peng, Lihong
Li, Xiaoling
author_facet Wang, Feixiang
Yang, Huandong
Wu, Yan
Peng, Lihong
Li, Xiaoling
author_sort Wang, Feixiang
collection PubMed
description INTRODUCTION: Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious. METHODS: Here, we developed a computational method called SAELGMDA for potential MDA prediction. First, microbe similarity and disease similarity are computed by integrating their functional similarity and Gaussian interaction profile kernel similarity. Second, one microbe-disease pair is presented as a feature vector by combining the microbe and disease similarity matrices. Next, the obtained feature vectors are mapped to a low-dimensional space based on a Sparse AutoEncoder. Finally, unknown microbe-disease pairs are classified based on Light Gradient boosting machine. RESULTS: The proposed SAELGMDA method was compared with four state-of-the-art MDA methods (MNNMDA, GATMDA, NTSHMDA, and LRLSHMDA) under five-fold cross validations on diseases, microbes, and microbe-disease pairs on the HMDAD and Disbiome databases. The results show that SAELGMDA computed the best accuracy, Matthews correlation coefficient, AUC, and AUPR under the majority of conditions, outperforming the other four MDA prediction models. In particular, SAELGMDA obtained the best AUCs of 0.8358 and 0.9301 under cross validation on diseases, 0.9838 and 0.9293 under cross validation on microbes, and 0.9857 and 0.9358 under cross validation on microbe-disease pairs on the HMDAD and Disbiome databases. Colorectal cancer, inflammatory bowel disease, and lung cancer are diseases that severely threat human health. We used the proposed SAELGMDA method to find possible microbes for the three diseases. The results demonstrate that there are potential associations between Clostridium coccoides and colorectal cancer and one between Sphingomonadaceae and inflammatory bowel disease. In addition, Veillonella may associate with autism. The inferred MDAs need further validation. CONCLUSION: We anticipate that the proposed SAELGMDA method contributes to the identification of new MDAs.
format Online
Article
Text
id pubmed-10320730
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103207302023-07-06 SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM Wang, Feixiang Yang, Huandong Wu, Yan Peng, Lihong Li, Xiaoling Front Microbiol Microbiology INTRODUCTION: Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious. METHODS: Here, we developed a computational method called SAELGMDA for potential MDA prediction. First, microbe similarity and disease similarity are computed by integrating their functional similarity and Gaussian interaction profile kernel similarity. Second, one microbe-disease pair is presented as a feature vector by combining the microbe and disease similarity matrices. Next, the obtained feature vectors are mapped to a low-dimensional space based on a Sparse AutoEncoder. Finally, unknown microbe-disease pairs are classified based on Light Gradient boosting machine. RESULTS: The proposed SAELGMDA method was compared with four state-of-the-art MDA methods (MNNMDA, GATMDA, NTSHMDA, and LRLSHMDA) under five-fold cross validations on diseases, microbes, and microbe-disease pairs on the HMDAD and Disbiome databases. The results show that SAELGMDA computed the best accuracy, Matthews correlation coefficient, AUC, and AUPR under the majority of conditions, outperforming the other four MDA prediction models. In particular, SAELGMDA obtained the best AUCs of 0.8358 and 0.9301 under cross validation on diseases, 0.9838 and 0.9293 under cross validation on microbes, and 0.9857 and 0.9358 under cross validation on microbe-disease pairs on the HMDAD and Disbiome databases. Colorectal cancer, inflammatory bowel disease, and lung cancer are diseases that severely threat human health. We used the proposed SAELGMDA method to find possible microbes for the three diseases. The results demonstrate that there are potential associations between Clostridium coccoides and colorectal cancer and one between Sphingomonadaceae and inflammatory bowel disease. In addition, Veillonella may associate with autism. The inferred MDAs need further validation. CONCLUSION: We anticipate that the proposed SAELGMDA method contributes to the identification of new MDAs. Frontiers Media S.A. 2023-06-21 /pmc/articles/PMC10320730/ /pubmed/37415823 http://dx.doi.org/10.3389/fmicb.2023.1207209 Text en Copyright © 2023 Wang, Yang, Wu, Peng and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Wang, Feixiang
Yang, Huandong
Wu, Yan
Peng, Lihong
Li, Xiaoling
SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM
title SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM
title_full SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM
title_fullStr SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM
title_full_unstemmed SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM
title_short SAELGMDA: Identifying human microbe–disease associations based on sparse autoencoder and LightGBM
title_sort saelgmda: identifying human microbe–disease associations based on sparse autoencoder and lightgbm
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10320730/
https://www.ncbi.nlm.nih.gov/pubmed/37415823
http://dx.doi.org/10.3389/fmicb.2023.1207209
work_keys_str_mv AT wangfeixiang saelgmdaidentifyinghumanmicrobediseaseassociationsbasedonsparseautoencoderandlightgbm
AT yanghuandong saelgmdaidentifyinghumanmicrobediseaseassociationsbasedonsparseautoencoderandlightgbm
AT wuyan saelgmdaidentifyinghumanmicrobediseaseassociationsbasedonsparseautoencoderandlightgbm
AT penglihong saelgmdaidentifyinghumanmicrobediseaseassociationsbasedonsparseautoencoderandlightgbm
AT lixiaoling saelgmdaidentifyinghumanmicrobediseaseassociationsbasedonsparseautoencoderandlightgbm