Cargando…

Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis

The molecular basis of diabetes mellitus is yet to be fully elucidated. We aimed to identify the most frequently reported and differential expressed genes (DEGs) in diabetes by using bioinformatics approaches. Text mining was used to screen 40,225 article abstracts from diabetes literature. These st...

Descripción completa

Detalles Bibliográficos
Autores principales: Elsherbini, Amira M., Alsamman, Alsamman M., Elsherbiny, Nehal M., El-Sherbiny, Mohamed, Ahmed, Rehab, Ebrahim, Hasnaa Ali, Bakkach, Joaira
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9656783/
https://www.ncbi.nlm.nih.gov/pubmed/36360783
http://dx.doi.org/10.3390/ijerph192113890
_version_ 1784829524141146112
author Elsherbini, Amira M.
Alsamman, Alsamman M.
Elsherbiny, Nehal M.
El-Sherbiny, Mohamed
Ahmed, Rehab
Ebrahim, Hasnaa Ali
Bakkach, Joaira
author_facet Elsherbini, Amira M.
Alsamman, Alsamman M.
Elsherbiny, Nehal M.
El-Sherbiny, Mohamed
Ahmed, Rehab
Ebrahim, Hasnaa Ali
Bakkach, Joaira
author_sort Elsherbini, Amira M.
collection PubMed
description The molecular basis of diabetes mellitus is yet to be fully elucidated. We aimed to identify the most frequently reported and differential expressed genes (DEGs) in diabetes by using bioinformatics approaches. Text mining was used to screen 40,225 article abstracts from diabetes literature. These studies highlighted 5939 diabetes-related genes spread across 22 human chromosomes, with 112 genes mentioned in more than 50 studies. Among these genes, HNF4A, PPARA, VEGFA, TCF7L2, HLA-DRB1, PPARG, NOS3, KCNJ11, PRKAA2, and HNF1A were mentioned in more than 200 articles. These genes are correlated with the regulation of glycogen and polysaccharide, adipogenesis, AGE/RAGE, and macrophage differentiation. Three datasets (44 patients and 57 controls) were subjected to gene expression analysis. The analysis revealed 135 significant DEGs, of which CEACAM6, ENPP4, HDAC5, HPCAL1, PARVG, STYXL1, VPS28, ZBTB33, ZFP37 and CCDC58 were the top 10 DEGs. These genes were enriched in aerobic respiration, T-cell antigen receptor pathway, tricarboxylic acid metabolic process, vitamin D receptor pathway, toll-like receptor signaling, and endoplasmic reticulum (ER) unfolded protein response. The results of text mining and gene expression analyses used as attribute values for machine learning (ML) analysis. The decision tree, extra-tree regressor and random forest algorithms were used in ML analysis to identify unique markers that could be used as diabetes diagnosis tools. These algorithms produced prediction models with accuracy ranges from 0.6364 to 0.88 and overall confidence interval (CI) of 95%. There were 39 biomarkers that could distinguish diabetic and non-diabetic patients, 12 of which were repeated multiple times. The majority of these genes are associated with stress response, signalling regulation, locomotion, cell motility, growth, and muscle adaptation. Machine learning algorithms highlighted the use of the HLA-DQB1 gene as a biomarker for diabetes early detection. Our data mining and gene expression analysis have provided useful information about potential biomarkers in diabetes.
format Online
Article
Text
id pubmed-9656783
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96567832022-11-15 Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis Elsherbini, Amira M. Alsamman, Alsamman M. Elsherbiny, Nehal M. El-Sherbiny, Mohamed Ahmed, Rehab Ebrahim, Hasnaa Ali Bakkach, Joaira Int J Environ Res Public Health Article The molecular basis of diabetes mellitus is yet to be fully elucidated. We aimed to identify the most frequently reported and differential expressed genes (DEGs) in diabetes by using bioinformatics approaches. Text mining was used to screen 40,225 article abstracts from diabetes literature. These studies highlighted 5939 diabetes-related genes spread across 22 human chromosomes, with 112 genes mentioned in more than 50 studies. Among these genes, HNF4A, PPARA, VEGFA, TCF7L2, HLA-DRB1, PPARG, NOS3, KCNJ11, PRKAA2, and HNF1A were mentioned in more than 200 articles. These genes are correlated with the regulation of glycogen and polysaccharide, adipogenesis, AGE/RAGE, and macrophage differentiation. Three datasets (44 patients and 57 controls) were subjected to gene expression analysis. The analysis revealed 135 significant DEGs, of which CEACAM6, ENPP4, HDAC5, HPCAL1, PARVG, STYXL1, VPS28, ZBTB33, ZFP37 and CCDC58 were the top 10 DEGs. These genes were enriched in aerobic respiration, T-cell antigen receptor pathway, tricarboxylic acid metabolic process, vitamin D receptor pathway, toll-like receptor signaling, and endoplasmic reticulum (ER) unfolded protein response. The results of text mining and gene expression analyses used as attribute values for machine learning (ML) analysis. The decision tree, extra-tree regressor and random forest algorithms were used in ML analysis to identify unique markers that could be used as diabetes diagnosis tools. These algorithms produced prediction models with accuracy ranges from 0.6364 to 0.88 and overall confidence interval (CI) of 95%. There were 39 biomarkers that could distinguish diabetic and non-diabetic patients, 12 of which were repeated multiple times. The majority of these genes are associated with stress response, signalling regulation, locomotion, cell motility, growth, and muscle adaptation. Machine learning algorithms highlighted the use of the HLA-DQB1 gene as a biomarker for diabetes early detection. Our data mining and gene expression analysis have provided useful information about potential biomarkers in diabetes. MDPI 2022-10-26 /pmc/articles/PMC9656783/ /pubmed/36360783 http://dx.doi.org/10.3390/ijerph192113890 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Elsherbini, Amira M.
Alsamman, Alsamman M.
Elsherbiny, Nehal M.
El-Sherbiny, Mohamed
Ahmed, Rehab
Ebrahim, Hasnaa Ali
Bakkach, Joaira
Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
title Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
title_full Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
title_fullStr Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
title_full_unstemmed Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
title_short Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
title_sort decoding diabetes biomarkers and related molecular mechanisms by using machine learning, text mining, and gene expression analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9656783/
https://www.ncbi.nlm.nih.gov/pubmed/36360783
http://dx.doi.org/10.3390/ijerph192113890
work_keys_str_mv AT elsherbiniamiram decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis
AT alsammanalsammanm decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis
AT elsherbinynehalm decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis
AT elsherbinymohamed decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis
AT ahmedrehab decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis
AT ebrahimhasnaaali decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis
AT bakkachjoaira decodingdiabetesbiomarkersandrelatedmolecularmechanismsbyusingmachinelearningtextminingandgeneexpressionanalysis