Cargando…
Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets
BACKGROUND: Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. METHODS: Gene expression datasets were downloaded from Gene...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425821/ https://www.ncbi.nlm.nih.gov/pubmed/36042431 http://dx.doi.org/10.1186/s12887-022-03557-y |
_version_ | 1784778548372832256 |
---|---|
author | Zhang, Mengyi Ke, Bocuo Zhuo, Huichuan Guo, Binhan |
author_facet | Zhang, Mengyi Ke, Bocuo Zhuo, Huichuan Guo, Binhan |
author_sort | Zhang, Mengyi |
collection | PubMed |
description | BACKGROUND: Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. METHODS: Gene expression datasets were downloaded from Gene Expression Omnibus (GEO), and gene sets with a potential pathogenic mechanism in KD were identified using differential expressed gene (DEG) screening, pathway enrichment analysis, random forest (RF) screening, and artificial neural network (ANN) construction. RESULTS: We extracted 2,017 DEGs (1,130 with upregulated and 887 with downregulated expression) from GEO. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that the DEGs were significantly enriched in innate/adaptive immune response-related processes. Subsequently, the results of weighted gene co-expression network analysis and DEG screening were combined and, using RF and ANN, a model with eight genes (VPS9D1, CACNA1E, SH3GLB1, RAB32, ADM, GYG1, PGS1, and HIST2H2AC) was constructed. Classification results of the new model for KD diagnosis showed excellent performance for different datasets, including those of patients with KD, convalescents, and healthy individuals, with area under the curve values of 1, 0.945, and 0.95, respectively. CONCLUSIONS: We used machine learning methods to construct and validate a diagnostic model using multiple bioinformatic datasets, and identified molecules expected to serve as new biomarkers for or therapeutic targets in KD. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12887-022-03557-y. |
format | Online Article Text |
id | pubmed-9425821 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-94258212022-08-30 Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets Zhang, Mengyi Ke, Bocuo Zhuo, Huichuan Guo, Binhan BMC Pediatr Research BACKGROUND: Kawasaki disease (KD), characterized by systemic vasculitis, is the leading cause of acquired heart disease in children. Herein, we developed a diagnostic model, with some prognosis ability, to help distinguish children with KD. METHODS: Gene expression datasets were downloaded from Gene Expression Omnibus (GEO), and gene sets with a potential pathogenic mechanism in KD were identified using differential expressed gene (DEG) screening, pathway enrichment analysis, random forest (RF) screening, and artificial neural network (ANN) construction. RESULTS: We extracted 2,017 DEGs (1,130 with upregulated and 887 with downregulated expression) from GEO. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that the DEGs were significantly enriched in innate/adaptive immune response-related processes. Subsequently, the results of weighted gene co-expression network analysis and DEG screening were combined and, using RF and ANN, a model with eight genes (VPS9D1, CACNA1E, SH3GLB1, RAB32, ADM, GYG1, PGS1, and HIST2H2AC) was constructed. Classification results of the new model for KD diagnosis showed excellent performance for different datasets, including those of patients with KD, convalescents, and healthy individuals, with area under the curve values of 1, 0.945, and 0.95, respectively. CONCLUSIONS: We used machine learning methods to construct and validate a diagnostic model using multiple bioinformatic datasets, and identified molecules expected to serve as new biomarkers for or therapeutic targets in KD. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12887-022-03557-y. BioMed Central 2022-08-30 /pmc/articles/PMC9425821/ /pubmed/36042431 http://dx.doi.org/10.1186/s12887-022-03557-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhang, Mengyi Ke, Bocuo Zhuo, Huichuan Guo, Binhan Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets |
title | Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets |
title_full | Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets |
title_fullStr | Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets |
title_full_unstemmed | Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets |
title_short | Diagnostic model based on bioinformatics and machine learning to distinguish Kawasaki disease using multiple datasets |
title_sort | diagnostic model based on bioinformatics and machine learning to distinguish kawasaki disease using multiple datasets |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425821/ https://www.ncbi.nlm.nih.gov/pubmed/36042431 http://dx.doi.org/10.1186/s12887-022-03557-y |
work_keys_str_mv | AT zhangmengyi diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets AT kebocuo diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets AT zhuohuichuan diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets AT guobinhan diagnosticmodelbasedonbioinformaticsandmachinelearningtodistinguishkawasakidiseaseusingmultipledatasets |