Cargando…
Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest
Systemic lupus erythematosus (SLE) is a latent, insidious autoimmune disease, and with the development of gene sequencing in recent years, our study aims to develop a gene-based predictive model to explore the identification of SLE at the genetic level. First, gene expression datasets of SLE whole b...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9667742/ https://www.ncbi.nlm.nih.gov/pubmed/36405750 http://dx.doi.org/10.3389/fimmu.2022.1025688 |
_version_ | 1784831779302014976 |
---|---|
author | Chen, Huajian Huang, Li Jiang, Xinyue Wang, Yue Bian, Yan Ma, Shumei Liu, Xiaodong |
author_facet | Chen, Huajian Huang, Li Jiang, Xinyue Wang, Yue Bian, Yan Ma, Shumei Liu, Xiaodong |
author_sort | Chen, Huajian |
collection | PubMed |
description | Systemic lupus erythematosus (SLE) is a latent, insidious autoimmune disease, and with the development of gene sequencing in recent years, our study aims to develop a gene-based predictive model to explore the identification of SLE at the genetic level. First, gene expression datasets of SLE whole blood samples were collected from the Gene Expression Omnibus (GEO) database. After the datasets were merged, they were divided into training and validation datasets in the ratio of 7:3, where the SLE samples and healthy samples of the training dataset were 334 and 71, respectively, and the SLE samples and healthy samples of the validation dataset were 143 and 30, respectively. The training dataset was used to build the disease risk prediction model, and the validation dataset was used to verify the model identification ability. We first analyzed differentially expressed genes (DEGs) and then used Lasso and random forest (RF) to screen out six key genes (OAS3, USP18, RTP4, SPATS2L, IFI27 and OAS1), which are essential to distinguish SLE from healthy samples. With six key genes incorporated and five iterations of 10-fold cross-validation performed into the RF model, we finally determined the RF model with optimal mtry. The mean values of area under the curve (AUC) and accuracy of the models were over 0.95. The validation dataset was then used to evaluate the AUC performance and our model had an AUC of 0.948. An external validation dataset (GSE99967) with an AUC of 0.810, an accuracy of 0.836, and a sensitivity of 0.921 was used to assess the model’s performance. The external validation dataset (GSE185047) of all SLE patients yielded an SLE sensitivity of up to 0.954. The final high-throughput RF model had a mean value of AUC over 0.9, again showing good results. In conclusion, we identified key genetic biomarkers and successfully developed a novel disease risk prediction model for SLE that can be used as a new SLE disease risk prediction aid and contribute to the identification of SLE. |
format | Online Article Text |
id | pubmed-9667742 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-96677422022-11-17 Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest Chen, Huajian Huang, Li Jiang, Xinyue Wang, Yue Bian, Yan Ma, Shumei Liu, Xiaodong Front Immunol Immunology Systemic lupus erythematosus (SLE) is a latent, insidious autoimmune disease, and with the development of gene sequencing in recent years, our study aims to develop a gene-based predictive model to explore the identification of SLE at the genetic level. First, gene expression datasets of SLE whole blood samples were collected from the Gene Expression Omnibus (GEO) database. After the datasets were merged, they were divided into training and validation datasets in the ratio of 7:3, where the SLE samples and healthy samples of the training dataset were 334 and 71, respectively, and the SLE samples and healthy samples of the validation dataset were 143 and 30, respectively. The training dataset was used to build the disease risk prediction model, and the validation dataset was used to verify the model identification ability. We first analyzed differentially expressed genes (DEGs) and then used Lasso and random forest (RF) to screen out six key genes (OAS3, USP18, RTP4, SPATS2L, IFI27 and OAS1), which are essential to distinguish SLE from healthy samples. With six key genes incorporated and five iterations of 10-fold cross-validation performed into the RF model, we finally determined the RF model with optimal mtry. The mean values of area under the curve (AUC) and accuracy of the models were over 0.95. The validation dataset was then used to evaluate the AUC performance and our model had an AUC of 0.948. An external validation dataset (GSE99967) with an AUC of 0.810, an accuracy of 0.836, and a sensitivity of 0.921 was used to assess the model’s performance. The external validation dataset (GSE185047) of all SLE patients yielded an SLE sensitivity of up to 0.954. The final high-throughput RF model had a mean value of AUC over 0.9, again showing good results. In conclusion, we identified key genetic biomarkers and successfully developed a novel disease risk prediction model for SLE that can be used as a new SLE disease risk prediction aid and contribute to the identification of SLE. Frontiers Media S.A. 2022-11-01 /pmc/articles/PMC9667742/ /pubmed/36405750 http://dx.doi.org/10.3389/fimmu.2022.1025688 Text en Copyright © 2022 Chen, Huang, Jiang, Wang, Bian, Ma and Liu https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Immunology Chen, Huajian Huang, Li Jiang, Xinyue Wang, Yue Bian, Yan Ma, Shumei Liu, Xiaodong Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
title | Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
title_full | Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
title_fullStr | Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
title_full_unstemmed | Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
title_short | Establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
title_sort | establishment and analysis of a disease risk prediction model for the systemic lupus erythematosus with random forest |
topic | Immunology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9667742/ https://www.ncbi.nlm.nih.gov/pubmed/36405750 http://dx.doi.org/10.3389/fimmu.2022.1025688 |
work_keys_str_mv | AT chenhuajian establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest AT huangli establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest AT jiangxinyue establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest AT wangyue establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest AT bianyan establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest AT mashumei establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest AT liuxiaodong establishmentandanalysisofadiseaseriskpredictionmodelforthesystemiclupuserythematosuswithrandomforest |