Cargando…
Graph data science and machine learning for the detection of COVID-19 infection from symptoms
BACKGROUND: COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280642/ https://www.ncbi.nlm.nih.gov/pubmed/37346701 http://dx.doi.org/10.7717/peerj-cs.1333 |
_version_ | 1785060842541154304 |
---|---|
author | Alqaissi, Eman Alotaibi, Fahd Ramzan, Muhammad Sher |
author_facet | Alqaissi, Eman Alotaibi, Fahd Ramzan, Muhammad Sher |
author_sort | Alqaissi, Eman |
collection | PubMed |
description | BACKGROUND: COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and medical devices. However, these tests are not always available during the time of the pandemic. Therefore, this study developed an automatic, intelligent, rapid, and real-time diagnostic model for the early detection of COVID-19 based on its symptoms. METHODS: The COVID-19 knowledge graph (KG) constructed based on literature from heterogeneous data is imported to understand the COVID-19 different relations. We added human disease ontology to the COVID-19 KG and applied a node-embedding graph algorithm called fast random projection to extract an extra feature from the COVID-19 dataset. Subsequently, experiments were conducted using two machine learning (ML) pipelines to predict COVID-19 infection from its symptoms. Additionally, automatic tuning of the model hyperparameters was adopted. RESULTS: We compared two graph-based ML models, logistic regression (LR) and random forest (RF) models. The proposed graph-based RF model achieved a small error rate = 0.0064 and the best scores on all performance metrics, including specificity = 98.71%, accuracy = 99.36%, precision = 99.65%, recall = 99.53%, and F1-score = 99.59%. Furthermore, the Matthews correlation coefficient achieved by the RF model was higher than that of the LR model. Comparative analysis with other ML algorithms and with studies from the literature showed that the proposed RF model exhibited the best detection accuracy. CONCLUSION: The graph-based RF model registered high performance in classifying the symptoms of COVID-19 infection, thereby indicating that the graph data science, in conjunction with ML techniques, helps improve performance and accelerate innovations. |
format | Online Article Text |
id | pubmed-10280642 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102806422023-06-21 Graph data science and machine learning for the detection of COVID-19 infection from symptoms Alqaissi, Eman Alotaibi, Fahd Ramzan, Muhammad Sher PeerJ Comput Sci Bioinformatics BACKGROUND: COVID-19 is an infectious disease caused by SARS-CoV-2. The symptoms of COVID-19 vary from mild-to-moderate respiratory illnesses, and it sometimes requires urgent medication. Therefore, it is crucial to detect COVID-19 at an early stage through specific clinical tests, testing kits, and medical devices. However, these tests are not always available during the time of the pandemic. Therefore, this study developed an automatic, intelligent, rapid, and real-time diagnostic model for the early detection of COVID-19 based on its symptoms. METHODS: The COVID-19 knowledge graph (KG) constructed based on literature from heterogeneous data is imported to understand the COVID-19 different relations. We added human disease ontology to the COVID-19 KG and applied a node-embedding graph algorithm called fast random projection to extract an extra feature from the COVID-19 dataset. Subsequently, experiments were conducted using two machine learning (ML) pipelines to predict COVID-19 infection from its symptoms. Additionally, automatic tuning of the model hyperparameters was adopted. RESULTS: We compared two graph-based ML models, logistic regression (LR) and random forest (RF) models. The proposed graph-based RF model achieved a small error rate = 0.0064 and the best scores on all performance metrics, including specificity = 98.71%, accuracy = 99.36%, precision = 99.65%, recall = 99.53%, and F1-score = 99.59%. Furthermore, the Matthews correlation coefficient achieved by the RF model was higher than that of the LR model. Comparative analysis with other ML algorithms and with studies from the literature showed that the proposed RF model exhibited the best detection accuracy. CONCLUSION: The graph-based RF model registered high performance in classifying the symptoms of COVID-19 infection, thereby indicating that the graph data science, in conjunction with ML techniques, helps improve performance and accelerate innovations. PeerJ Inc. 2023-04-10 /pmc/articles/PMC10280642/ /pubmed/37346701 http://dx.doi.org/10.7717/peerj-cs.1333 Text en © 2023 Alqaissi et al. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by-nc/4.0/) , which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Alqaissi, Eman Alotaibi, Fahd Ramzan, Muhammad Sher Graph data science and machine learning for the detection of COVID-19 infection from symptoms |
title | Graph data science and machine learning for the detection of COVID-19 infection from symptoms |
title_full | Graph data science and machine learning for the detection of COVID-19 infection from symptoms |
title_fullStr | Graph data science and machine learning for the detection of COVID-19 infection from symptoms |
title_full_unstemmed | Graph data science and machine learning for the detection of COVID-19 infection from symptoms |
title_short | Graph data science and machine learning for the detection of COVID-19 infection from symptoms |
title_sort | graph data science and machine learning for the detection of covid-19 infection from symptoms |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280642/ https://www.ncbi.nlm.nih.gov/pubmed/37346701 http://dx.doi.org/10.7717/peerj-cs.1333 |
work_keys_str_mv | AT alqaissieman graphdatascienceandmachinelearningforthedetectionofcovid19infectionfromsymptoms AT alotaibifahd graphdatascienceandmachinelearningforthedetectionofcovid19infectionfromsymptoms AT ramzanmuhammadsher graphdatascienceandmachinelearningforthedetectionofcovid19infectionfromsymptoms |