Cargando…
Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods
Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931764/ https://www.ncbi.nlm.nih.gov/pubmed/36792688 http://dx.doi.org/10.1038/s41598-022-26345-1 |
_version_ | 1784889305121947648 |
---|---|
author | Chen, Zi-An Ma, Hui-hui Wang, Yan Tian, Hui Mi, Jian-wei Yao, Dong-Mei Yang, Chuan-Jie |
author_facet | Chen, Zi-An Ma, Hui-hui Wang, Yan Tian, Hui Mi, Jian-wei Yao, Dong-Mei Yang, Chuan-Jie |
author_sort | Chen, Zi-An |
collection | PubMed |
description | Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on larger samples through machine learning approaches may inform the diagnosis and treatment of diseases. A comprehensive analysis was conducted on all CD datasets of Gene Expression Omnibus (GEO); our team then used the robust rank aggregation (RRA) method to identify differentially expressed genes (DEGs) between controls and CD patients. PPI (protein‒protein interaction) network and functional enrichment analyses were performed to investigate the potential functions of the DEGs, with molecular complex detection (MCODE) identifying some important functional modules from the PPI network. Three machine learning algorithms, support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and least absolute shrinkage and selection operator (LASSO), were applied to determine characteristic genes, which were verified by ROC curve analysis and immunohistochemistry (IHC) using clinical samples. Univariable and multivariable logistic regression were used to establish a machine learning score for diagnosis. Single-sample GSEA (ssGSEA) was performed to examine the correlation between immune infiltration and biomarkers. In total, 5 datasets met the inclusion criteria: GSE75214, GSE95095, GSE126124, GSE179285, and GSE186582. Based on RRA integrated analysis, 203 significant DEGs were identified (120 upregulated genes and 83 downregulated genes), and MCODE revealed some important functional modules in the PPI network. Machine learning identified LCN2, REG1A, AQP9, CCL2, GIP, PROK2, DEFA5, CXCL9, and NAMPT; AQP9, PROK2, LCN2, and NAMPT were further verified by ROC curves and IHC in the external cohort. The final machine learning score was defined as [Expression level of AQP9 × (2.644)] + [Expression level of LCN2 × (0.958)] + [Expression level of NAMPT × (1.115)]. ssGSEA showed markedly elevated levels of dendritic cells and innate immune cells, such as macrophages and NK cells, in CD, consistent with the gene enrichment results that the DEGs are mainly involved in the IL-17 signaling pathway and humoral immune response. The selected biomarkers analyzed by the RRA method and machine learning are highly reliable. These findings improve our understanding of the molecular mechanisms of CD pathogenesis. |
format | Online Article Text |
id | pubmed-9931764 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99317642023-02-17 Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods Chen, Zi-An Ma, Hui-hui Wang, Yan Tian, Hui Mi, Jian-wei Yao, Dong-Mei Yang, Chuan-Jie Sci Rep Article Crohn's disease (CD) is a complex autoimmune disorder presumed to be driven by complex interactions of genetic, immune, microbial and even environmental factors. Intrinsic molecular mechanisms in CD, however, remain poorly understood. The identification of novel biomarkers in CD cases based on larger samples through machine learning approaches may inform the diagnosis and treatment of diseases. A comprehensive analysis was conducted on all CD datasets of Gene Expression Omnibus (GEO); our team then used the robust rank aggregation (RRA) method to identify differentially expressed genes (DEGs) between controls and CD patients. PPI (protein‒protein interaction) network and functional enrichment analyses were performed to investigate the potential functions of the DEGs, with molecular complex detection (MCODE) identifying some important functional modules from the PPI network. Three machine learning algorithms, support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and least absolute shrinkage and selection operator (LASSO), were applied to determine characteristic genes, which were verified by ROC curve analysis and immunohistochemistry (IHC) using clinical samples. Univariable and multivariable logistic regression were used to establish a machine learning score for diagnosis. Single-sample GSEA (ssGSEA) was performed to examine the correlation between immune infiltration and biomarkers. In total, 5 datasets met the inclusion criteria: GSE75214, GSE95095, GSE126124, GSE179285, and GSE186582. Based on RRA integrated analysis, 203 significant DEGs were identified (120 upregulated genes and 83 downregulated genes), and MCODE revealed some important functional modules in the PPI network. Machine learning identified LCN2, REG1A, AQP9, CCL2, GIP, PROK2, DEFA5, CXCL9, and NAMPT; AQP9, PROK2, LCN2, and NAMPT were further verified by ROC curves and IHC in the external cohort. The final machine learning score was defined as [Expression level of AQP9 × (2.644)] + [Expression level of LCN2 × (0.958)] + [Expression level of NAMPT × (1.115)]. ssGSEA showed markedly elevated levels of dendritic cells and innate immune cells, such as macrophages and NK cells, in CD, consistent with the gene enrichment results that the DEGs are mainly involved in the IL-17 signaling pathway and humoral immune response. The selected biomarkers analyzed by the RRA method and machine learning are highly reliable. These findings improve our understanding of the molecular mechanisms of CD pathogenesis. Nature Publishing Group UK 2023-02-15 /pmc/articles/PMC9931764/ /pubmed/36792688 http://dx.doi.org/10.1038/s41598-022-26345-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Chen, Zi-An Ma, Hui-hui Wang, Yan Tian, Hui Mi, Jian-wei Yao, Dong-Mei Yang, Chuan-Jie Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods |
title | Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods |
title_full | Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods |
title_fullStr | Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods |
title_full_unstemmed | Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods |
title_short | Integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in Crohn's disease based on three machine learning methods |
title_sort | integrated multiple microarray studies by robust rank aggregation to identify immune-associated biomarkers in crohn's disease based on three machine learning methods |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931764/ https://www.ncbi.nlm.nih.gov/pubmed/36792688 http://dx.doi.org/10.1038/s41598-022-26345-1 |
work_keys_str_mv | AT chenzian integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods AT mahuihui integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods AT wangyan integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods AT tianhui integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods AT mijianwei integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods AT yaodongmei integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods AT yangchuanjie integratedmultiplemicroarraystudiesbyrobustrankaggregationtoidentifyimmuneassociatedbiomarkersincrohnsdiseasebasedonthreemachinelearningmethods |