Cargando…

Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets

BACKGROUND: Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. METHODS: Gene expression datasets were downloaded from Gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Mengyi, Ke, Bocuo, Zhuo, Huichuan, Guo, Binhan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425821/
https://www.ncbi.nlm.nih.gov/pubmed/36042431
http://dx.doi.org/10.1186/s12887-022-03557-y
_version_ 1784778548372832256
author Zhang, Mengyi
Ke, Bocuo
Zhuo, Huichuan
Guo, Binhan
author_facet Zhang, Mengyi
Ke, Bocuo
Zhuo, Huichuan
Guo, Binhan
author_sort Zhang, Mengyi
collection PubMed
description BACKGROUND: Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. METHODS: Gene expression datasets were downloaded from Gene Expression Omnibus (GEO), and gene sets with a potential pathogenic mechanism in KD were identified using differential expressed gene (DEG) screening, pathway enrichment analysis, random forest (RF) screening, and artificial neural network (ANN) construction. RESULTS: We extracted 2,017 DEGs (1,130 with upregulated and 887 with downregulated expression) from GEO. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that the DEGs were significantly enriched in innate/adaptive immune response-related processes. Subsequently, the results of weighted gene co-expression network analysis and DEG screening were combined and, using RF and ANN, a model with eight genes (VPS9D1, CACNA1E, SH3GLB1, RAB32, ADM, GYG1, PGS1, and HIST2H2AC) was constructed. Classification results of the new model for KD diagnosis showed excellent performance for different datasets, including those of patients with KD, convalescents, and healthy individuals, with area under the curve values of 1, 0.945, and 0.95, respectively. CONCLUSIONS: We used machine learning methods to construct and validate a diagnostic model using multiple bioinformatic datasets, and identified molecules expected to serve as new biomarkers for or therapeutic targets in KD. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12887-022-03557-y.
format Online
Article
Text
id pubmed-9425821
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94258212022-08-30 Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets Zhang, Mengyi Ke, Bocuo Zhuo, Huichuan Guo, Binhan BMC Pediatr Research BACKGROUND: Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. METHODS: Gene expression datasets were downloaded from Gene Expression Omnibus (GEO), and gene sets with a potential pathogenic mechanism in KD were identified using differential expressed gene (DEG) screening, pathway enrichment analysis, random forest (RF) screening, and artificial neural network (ANN) construction. RESULTS: We extracted 2,017 DEGs (1,130 with upregulated and 887 with downregulated expression) from GEO. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that the DEGs were significantly enriched in innate/adaptive immune response-related processes. Subsequently, the results of weighted gene co-expression network analysis and DEG screening were combined and, using RF and ANN, a model with eight genes (VPS9D1, CACNA1E, SH3GLB1, RAB32, ADM, GYG1, PGS1, and HIST2H2AC) was constructed. Classification results of the new model for KD diagnosis showed excellent performance for different datasets, including those of patients with KD, convalescents, and healthy individuals, with area under the curve values of 1, 0.945, and 0.95, respectively. CONCLUSIONS: We used machine learning methods to construct and validate a diagnostic model using multiple bioinformatic datasets, and identified molecules expected to serve as new biomarkers for or therapeutic targets in KD. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12887-022-03557-y. BioMed Central 2022-08-30 /pmc/articles/PMC9425821/ /pubmed/36042431 http://dx.doi.org/10.1186/s12887-022-03557-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhang, Mengyi
Ke, Bocuo
Zhuo, Huichuan
Guo, Binhan
Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
title Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
title_full Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
title_fullStr Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
title_full_unstemmed Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
title_short Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
title_sort diagnostic model based on bioinformatics and machine learning to distinguish kawasaki disease using multiple datasets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425821/
https://www.ncbi.nlm.nih.gov/pubmed/36042431
http://dx.doi.org/10.1186/s12887-022-03557-y
work_keys_str_mv AT zhangmengyi diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets
AT kebocuo diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets
AT zhuohuichuan diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets
AT guobinhan diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets