Cargando…

Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods

Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zi-An, Ma, Hui-hui, Wang, Yan, Tian, Hui, Mi, Jian-wei, Yao, Dong-Mei, Yang, Chuan-Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931764/
https://www.ncbi.nlm.nih.gov/pubmed/36792688
http://dx.doi.org/10.1038/s41598-022-26345-1
_version_ 1784889305121947648
author Chen, Zi-An
Ma, Hui-hui
Wang, Yan
Tian, Hui
Mi, Jian-wei
Yao, Dong-Mei
Yang, Chuan-Jie
author_facet Chen, Zi-An
Ma, Hui-hui
Wang, Yan
Tian, Hui
Mi, Jian-wei
Yao, Dong-Mei
Yang, Chuan-Jie
author_sort Chen, Zi-An
collection PubMed
description Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on larger samples through machine learning approaches may inform the diagnosis and treatment of diseases. A comprehensive analysis was conducted on all CD datasets of Gene Expression Omnibus (GEO); our team then used the robust rank aggregation (RRA) method to identify differentially expressed genes (DEGs) between controls and CD patients. PPI (protein‒protein interaction) network and functional enrichment analyses were performed to investigate the potential functions of the DEGs, with molecular complex detection (MCODE) identifying some important functional modules from the PPI network. Three machine learning algorithms, support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and least absolute shrinkage and selection operator (LASSO), were applied to determine characteristic genes, which were verified by ROC curve analysis and immunohistochemistry (IHC) using clinical samples. Univariable and multivariable logistic regression were used to establish a machine learning score for diagnosis. Single-sample GSEA (ssGSEA) was performed to examine the correlation between immune infiltration and biomarkers. In total, 5 datasets met the inclusion criteria: GSE75214, GSE95095, GSE126124, GSE179285, and GSE186582. Based on RRA integrated analysis, 203 significant DEGs were identified (120 upregulated genes and 83 downregulated genes), and MCODE revealed some important functional modules in the PPI network. Machine learning identified LCN2, REG1A, AQP9, CCL2, GIP, PROK2, DEFA5, CXCL9, and NAMPT; AQP9, PROK2, LCN2, and NAMPT were further verified by ROC curves and IHC in the external cohort. The final machine learning score was defined as [Expression level of AQP9 × (2.644)] + [Expression level of LCN2 × (0.958)] + [Expression level of NAMPT × (1.115)]. ssGSEA showed markedly elevated levels of dendritic cells and innate immune cells, such as macrophages and NK cells, in CD, consistent with the gene enrichment results that the DEGs are mainly involved in the IL-17 signaling pathway and humoral immune response. The selected biomarkers analyzed by the RRA method and machine learning are highly reliable. These findings improve our understanding of the molecular mechanisms of CD pathogenesis.
format Online
Article
Text
id pubmed-9931764
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99317642023-02-17 Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods Chen, Zi-An Ma, Hui-hui Wang, Yan Tian, Hui Mi, Jian-wei Yao, Dong-Mei Yang, Chuan-Jie Sci Rep Article Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on larger samples through machine learning approaches may inform the diagnosis and treatment of diseases. A comprehensive analysis was conducted on all CD datasets of Gene Expression Omnibus (GEO); our team then used the robust rank aggregation (RRA) method to identify differentially expressed genes (DEGs) between controls and CD patients. PPI (protein‒protein interaction) network and functional enrichment analyses were performed to investigate the potential functions of the DEGs, with molecular complex detection (MCODE) identifying some important functional modules from the PPI network. Three machine learning algorithms, support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and least absolute shrinkage and selection operator (LASSO), were applied to determine characteristic genes, which were verified by ROC curve analysis and immunohistochemistry (IHC) using clinical samples. Univariable and multivariable logistic regression were used to establish a machine learning score for diagnosis. Single-sample GSEA (ssGSEA) was performed to examine the correlation between immune infiltration and biomarkers. In total, 5 datasets met the inclusion criteria: GSE75214, GSE95095, GSE126124, GSE179285, and GSE186582. Based on RRA integrated analysis, 203 significant DEGs were identified (120 upregulated genes and 83 downregulated genes), and MCODE revealed some important functional modules in the PPI network. Machine learning identified LCN2, REG1A, AQP9, CCL2, GIP, PROK2, DEFA5, CXCL9, and NAMPT; AQP9, PROK2, LCN2, and NAMPT were further verified by ROC curves and IHC in the external cohort. The final machine learning score was defined as [Expression level of AQP9 × (2.644)] + [Expression level of LCN2 × (0.958)] + [Expression level of NAMPT × (1.115)]. ssGSEA showed markedly elevated levels of dendritic cells and innate immune cells, such as macrophages and NK cells, in CD, consistent with the gene enrichment results that the DEGs are mainly involved in the IL-17 signaling pathway and humoral immune response. The selected biomarkers analyzed by the RRA method and machine learning are highly reliable. These findings improve our understanding of the molecular mechanisms of CD pathogenesis. Nature Publishing Group UK 2023-02-15 /pmc/articles/PMC9931764/ /pubmed/36792688 http://dx.doi.org/10.1038/s41598-022-26345-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Chen, Zi-An
Ma, Hui-hui
Wang, Yan
Tian, Hui
Mi, Jian-wei
Yao, Dong-Mei
Yang, Chuan-Jie
Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
title Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
title_full Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
title_fullStr Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
title_full_unstemmed Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
title_short Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
title_sort integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in crohn's disease based on three machine learning methods
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931764/
https://www.ncbi.nlm.nih.gov/pubmed/36792688
http://dx.doi.org/10.1038/s41598-022-26345-1
work_keys_str_mv AT chenzian integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods
AT mahuihui integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods
AT wangyan integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods
AT tianhui integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods
AT mijianwei integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods
AT yaodongmei integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods
AT yangchuanjie integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods