Cargando…

Comparison of ischemic stroke diagnosis models based on machine learning

BACKGROUND: The incidence, prevalence, and mortality of ischemic stroke (IS) continue to rise, resulting in a serious global disease burden. The prediction models have a great value in the early prediction and diagnosis of IS. METHODS: The R software was used to screen the differentially expressed g...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Wan-Xia, Wang, Fang-Fang, Pan, Yun-Yan, Xie, Jian-Qin, Lu, Ming-Hua, You, Chong-Ge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9762505/
https://www.ncbi.nlm.nih.gov/pubmed/36545400
http://dx.doi.org/10.3389/fneur.2022.1014346
_version_ 1784852876298813440
author Yang, Wan-Xia
Wang, Fang-Fang
Pan, Yun-Yan
Xie, Jian-Qin
Lu, Ming-Hua
You, Chong-Ge
author_facet Yang, Wan-Xia
Wang, Fang-Fang
Pan, Yun-Yan
Xie, Jian-Qin
Lu, Ming-Hua
You, Chong-Ge
author_sort Yang, Wan-Xia
collection PubMed
description BACKGROUND: The incidence, prevalence, and mortality of ischemic stroke (IS) continue to rise, resulting in a serious global disease burden. The prediction models have a great value in the early prediction and diagnosis of IS. METHODS: The R software was used to screen the differentially expressed genes (DEGs) of IS and control samples in the datasets GSE16561, GSE58294, and GSE37587 and analyze DEGs for enrichment analysis. The feature genes of IS were obtained by several machine learning algorithms, including the least absolute shrinkage and selector operation (LASSO) logistic regression, the support vector machine-recursive feature elimination (SVM-RFE), and the Random Forest (RF). The IS diagnostic models were constructed based on transcriptomics by machine learning and artificial neural network (ANN). RESULTS: A total of 69 DEGs, mainly involved in immune and inflammatory responses, were identified. The pathways enriched in the IS group were complement and coagulation cascades, lysosome, PPAR signaling pathway, regulation of autophagy, and toll-like receptor signaling pathway. The feature genes selected by LASSO, SVM-RFE, and RF were 17, 10, and 12, respectively. The area under the curve (AUC) of the LASSO model in the training dataset, GSE22255, and GSE195442 was 0.969, 0.890, and 1.000. The AUC of the SVM-RFE model was 0.957, 0.805, and 1.000, respectively. The AUC of the RF model was 0.947, 0.935, and 1.000, respectively. The models have good sensitivity, specificity, and accuracy. The AUC of the LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 1.000, 0.995, and 0.997, respectively, in the training dataset. However, the AUC of LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 0.688, 0.605, and 0.619, respectively, in the GSE22255 dataset. The AUC of the LASSO+ANN and RF+ANN models was 0.740 and 0.630, respectively, in the GSE195442 dataset. In the training dataset, the sensitivity, specificity, and accuracy of the LASSO+ANN model were 1.000, 1.000, and 1.000, respectively; of the SVM-RFE+ANN model were 0.946, 0.982, and 0.964, respectively; and of the RF+ANN model were 0.964, 1.000, and 0.982, respectively. In the test datasets, the sensitivity was very satisfactory; however, the specificity and accuracy were not good. CONCLUSION: The LASSO, SVM-RFE, and RF models have good prediction abilities. However, the ANN model is efficient at classifying positive samples and is unsuitable at classifying negative samples.
format Online
Article
Text
id pubmed-9762505
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97625052022-12-20 Comparison of ischemic stroke diagnosis models based on machine learning Yang, Wan-Xia Wang, Fang-Fang Pan, Yun-Yan Xie, Jian-Qin Lu, Ming-Hua You, Chong-Ge Front Neurol Neurology BACKGROUND: The incidence, prevalence, and mortality of ischemic stroke (IS) continue to rise, resulting in a serious global disease burden. The prediction models have a great value in the early prediction and diagnosis of IS. METHODS: The R software was used to screen the differentially expressed genes (DEGs) of IS and control samples in the datasets GSE16561, GSE58294, and GSE37587 and analyze DEGs for enrichment analysis. The feature genes of IS were obtained by several machine learning algorithms, including the least absolute shrinkage and selector operation (LASSO) logistic regression, the support vector machine-recursive feature elimination (SVM-RFE), and the Random Forest (RF). The IS diagnostic models were constructed based on transcriptomics by machine learning and artificial neural network (ANN). RESULTS: A total of 69 DEGs, mainly involved in immune and inflammatory responses, were identified. The pathways enriched in the IS group were complement and coagulation cascades, lysosome, PPAR signaling pathway, regulation of autophagy, and toll-like receptor signaling pathway. The feature genes selected by LASSO, SVM-RFE, and RF were 17, 10, and 12, respectively. The area under the curve (AUC) of the LASSO model in the training dataset, GSE22255, and GSE195442 was 0.969, 0.890, and 1.000. The AUC of the SVM-RFE model was 0.957, 0.805, and 1.000, respectively. The AUC of the RF model was 0.947, 0.935, and 1.000, respectively. The models have good sensitivity, specificity, and accuracy. The AUC of the LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 1.000, 0.995, and 0.997, respectively, in the training dataset. However, the AUC of LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 0.688, 0.605, and 0.619, respectively, in the GSE22255 dataset. The AUC of the LASSO+ANN and RF+ANN models was 0.740 and 0.630, respectively, in the GSE195442 dataset. In the training dataset, the sensitivity, specificity, and accuracy of the LASSO+ANN model were 1.000, 1.000, and 1.000, respectively; of the SVM-RFE+ANN model were 0.946, 0.982, and 0.964, respectively; and of the RF+ANN model were 0.964, 1.000, and 0.982, respectively. In the test datasets, the sensitivity was very satisfactory; however, the specificity and accuracy were not good. CONCLUSION: The LASSO, SVM-RFE, and RF models have good prediction abilities. However, the ANN model is efficient at classifying positive samples and is unsuitable at classifying negative samples. Frontiers Media S.A. 2022-12-05 /pmc/articles/PMC9762505/ /pubmed/36545400 http://dx.doi.org/10.3389/fneur.2022.1014346 Text en Copyright © 2022 Yang, Wang, Pan, Xie, Lu and You. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neurology
Yang, Wan-Xia
Wang, Fang-Fang
Pan, Yun-Yan
Xie, Jian-Qin
Lu, Ming-Hua
You, Chong-Ge
Comparison of ischemic stroke diagnosis models based on machine learning
title Comparison of ischemic stroke diagnosis models based on machine learning
title_full Comparison of ischemic stroke diagnosis models based on machine learning
title_fullStr Comparison of ischemic stroke diagnosis models based on machine learning
title_full_unstemmed Comparison of ischemic stroke diagnosis models based on machine learning
title_short Comparison of ischemic stroke diagnosis models based on machine learning
title_sort comparison of ischemic stroke diagnosis models based on machine learning
topic Neurology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9762505/
https://www.ncbi.nlm.nih.gov/pubmed/36545400
http://dx.doi.org/10.3389/fneur.2022.1014346
work_keys_str_mv AT yangwanxia comparisonofischemicstrokediagnosismodelsbasedonmachinelearning
AT wangfangfang comparisonofischemicstrokediagnosismodelsbasedonmachinelearning
AT panyunyan comparisonofischemicstrokediagnosismodelsbasedonmachinelearning
AT xiejianqin comparisonofischemicstrokediagnosismodelsbasedonmachinelearning
AT luminghua comparisonofischemicstrokediagnosismodelsbasedonmachinelearning
AT youchongge comparisonofischemicstrokediagnosismodelsbasedonmachinelearning