Cargando…

Efficient analysis of COVID-19 clinical data using machine learning models

Because of the rapid spread of COVID-19 to almost every part of the globe, huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends and make discoveries like never before by leveraging such big data. This data is of many different...

Descripción completa

Detalles Bibliográficos
Autores principales: Ali, Sarwan, Zhou, Yijing, Patterson, Murray
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9066140/
https://www.ncbi.nlm.nih.gov/pubmed/35507111
http://dx.doi.org/10.1007/s11517-022-02570-8
_version_ 1784699741969317888
author Ali, Sarwan
Zhou, Yijing
Patterson, Murray
author_facet Ali, Sarwan
Zhou, Yijing
Patterson, Murray
author_sort Ali, Sarwan
collection PubMed
description Because of the rapid spread of COVID-19 to almost every part of the globe, huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends and make discoveries like never before by leveraging such big data. This data is of many different varieties and can be of different levels of veracity, e.g., precise, imprecise, uncertain, and missing, making it challenging to extract meaningful information from such data. Yet, efficient analyses of this continuously growing and evolving COVID-19 data is crucial to inform — often in real-time — the relevant measures needed for controlling, mitigating, and ultimately avoiding viral spread. Applying machine learning-based algorithms to this big data is a natural approach to take to this aim since they can quickly scale to such data and extract the relevant information in the presence of variety and different levels of veracity. This is important for COVID-19 and potential future pandemics in general. In this paper, we design a straightforward encoding of clinical data (on categorical attributes) into a fixed-length feature vector representation and then propose a model that first performs efficient feature selection from such representation. We apply this approach to two clinical datasets of the COVID-19 patients and then apply different machine learning algorithms downstream for classification purposes. We show that with the efficient feature selection algorithm, we can achieve a prediction accuracy of more than 90% in most cases. We also computed the importance of different attributes in the dataset using information gain. This can help the policymakers focus on only certain attributes to study this disease rather than focusing on multiple random factors that may not be very informative to patient outcomes. GRAPHICAL ABSTRACT: [Image: see text]
format Online
Article
Text
id pubmed-9066140
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-90661402022-05-04 Efficient analysis of COVID-19 clinical data using machine learning models Ali, Sarwan Zhou, Yijing Patterson, Murray Med Biol Eng Comput Original Article Because of the rapid spread of COVID-19 to almost every part of the globe, huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends and make discoveries like never before by leveraging such big data. This data is of many different varieties and can be of different levels of veracity, e.g., precise, imprecise, uncertain, and missing, making it challenging to extract meaningful information from such data. Yet, efficient analyses of this continuously growing and evolving COVID-19 data is crucial to inform — often in real-time — the relevant measures needed for controlling, mitigating, and ultimately avoiding viral spread. Applying machine learning-based algorithms to this big data is a natural approach to take to this aim since they can quickly scale to such data and extract the relevant information in the presence of variety and different levels of veracity. This is important for COVID-19 and potential future pandemics in general. In this paper, we design a straightforward encoding of clinical data (on categorical attributes) into a fixed-length feature vector representation and then propose a model that first performs efficient feature selection from such representation. We apply this approach to two clinical datasets of the COVID-19 patients and then apply different machine learning algorithms downstream for classification purposes. We show that with the efficient feature selection algorithm, we can achieve a prediction accuracy of more than 90% in most cases. We also computed the importance of different attributes in the dataset using information gain. This can help the policymakers focus on only certain attributes to study this disease rather than focusing on multiple random factors that may not be very informative to patient outcomes. GRAPHICAL ABSTRACT: [Image: see text] Springer Berlin Heidelberg 2022-05-04 2022 /pmc/articles/PMC9066140/ /pubmed/35507111 http://dx.doi.org/10.1007/s11517-022-02570-8 Text en © International Federation for Medical and Biological Engineering 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Ali, Sarwan
Zhou, Yijing
Patterson, Murray
Efficient analysis of COVID-19 clinical data using machine learning models
title Efficient analysis of COVID-19 clinical data using machine learning models
title_full Efficient analysis of COVID-19 clinical data using machine learning models
title_fullStr Efficient analysis of COVID-19 clinical data using machine learning models
title_full_unstemmed Efficient analysis of COVID-19 clinical data using machine learning models
title_short Efficient analysis of COVID-19 clinical data using machine learning models
title_sort efficient analysis of covid-19 clinical data using machine learning models
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9066140/
https://www.ncbi.nlm.nih.gov/pubmed/35507111
http://dx.doi.org/10.1007/s11517-022-02570-8
work_keys_str_mv AT alisarwan efficientanalysisofcovid19clinicaldatausingmachinelearningmodels
AT zhouyijing efficientanalysisofcovid19clinicaldatausingmachinelearningmodels
AT pattersonmurray efficientanalysisofcovid19clinicaldatausingmachinelearningmodels