Cargando…
Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expre...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200771/ https://www.ncbi.nlm.nih.gov/pubmed/35705632 http://dx.doi.org/10.1038/s41598-022-14048-6 |
_version_ | 1784728139491966976 |
---|---|
author | Zhang, Lin Mao, Rui Lau, Chung Tai Chung, Wai Chak Chan, Jacky C. P. Liang, Feng Zhao, Chenchen Zhang, Xuan Bian, Zhaoxiang |
author_facet | Zhang, Lin Mao, Rui Lau, Chung Tai Chung, Wai Chak Chan, Jacky C. P. Liang, Feng Zhao, Chenchen Zhang, Xuan Bian, Zhaoxiang |
author_sort | Zhang, Lin |
collection | PubMed |
description | Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P < 0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted. |
format | Online Article Text |
id | pubmed-9200771 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-92007712022-06-17 Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods Zhang, Lin Mao, Rui Lau, Chung Tai Chung, Wai Chak Chan, Jacky C. P. Liang, Feng Zhao, Chenchen Zhang, Xuan Bian, Zhaoxiang Sci Rep Article Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P < 0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted. Nature Publishing Group UK 2022-06-15 /pmc/articles/PMC9200771/ /pubmed/35705632 http://dx.doi.org/10.1038/s41598-022-14048-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Zhang, Lin Mao, Rui Lau, Chung Tai Chung, Wai Chak Chan, Jacky C. P. Liang, Feng Zhao, Chenchen Zhang, Xuan Bian, Zhaoxiang Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
title | Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
title_full | Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
title_fullStr | Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
title_full_unstemmed | Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
title_short | Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
title_sort | identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200771/ https://www.ncbi.nlm.nih.gov/pubmed/35705632 http://dx.doi.org/10.1038/s41598-022-14048-6 |
work_keys_str_mv | AT zhanglin identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT maorui identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT lauchungtai identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT chungwaichak identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT chanjackycp identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT liangfeng identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT zhaochenchen identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT zhangxuan identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods AT bianzhaoxiang identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods |