Cargando…

Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods

Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expre...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lin, Mao, Rui, Lau, Chung Tai, Chung, Wai Chak, Chan, Jacky C. P., Liang, Feng, Zhao, Chenchen, Zhang, Xuan, Bian, Zhaoxiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200771/
https://www.ncbi.nlm.nih.gov/pubmed/35705632
http://dx.doi.org/10.1038/s41598-022-14048-6
_version_ 1784728139491966976
author Zhang, Lin
Mao, Rui
Lau, Chung Tai
Chung, Wai Chak
Chan, Jacky C. P.
Liang, Feng
Zhao, Chenchen
Zhang, Xuan
Bian, Zhaoxiang
author_facet Zhang, Lin
Mao, Rui
Lau, Chung Tai
Chung, Wai Chak
Chan, Jacky C. P.
Liang, Feng
Zhao, Chenchen
Zhang, Xuan
Bian, Zhaoxiang
author_sort Zhang, Lin
collection PubMed
description Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P  <  0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted.
format Online
Article
Text
id pubmed-9200771
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92007712022-06-17 Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods Zhang, Lin Mao, Rui Lau, Chung Tai Chung, Wai Chak Chan, Jacky C. P. Liang, Feng Zhao, Chenchen Zhang, Xuan Bian, Zhaoxiang Sci Rep Article Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P  <  0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted. Nature Publishing Group UK 2022-06-15 /pmc/articles/PMC9200771/ /pubmed/35705632 http://dx.doi.org/10.1038/s41598-022-14048-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Zhang, Lin
Mao, Rui
Lau, Chung Tai
Chung, Wai Chak
Chan, Jacky C. P.
Liang, Feng
Zhao, Chenchen
Zhang, Xuan
Bian, Zhaoxiang
Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_full Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_fullStr Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_full_unstemmed Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_short Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_sort identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200771/
https://www.ncbi.nlm.nih.gov/pubmed/35705632
http://dx.doi.org/10.1038/s41598-022-14048-6
work_keys_str_mv AT zhanglin identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT maorui identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT lauchungtai identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT chungwaichak identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT chanjackycp identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT liangfeng identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT zhaochenchen identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT zhangxuan identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT bianzhaoxiang identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods