Cargando…
A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction
The increase in coronavirus disease 2019 (COVID-19) infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed pressure on healthcare services worldwide. Therefore, it is crucial to identify critical factors for the assessment of the severity of COVID-19 infection an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316550/ https://www.ncbi.nlm.nih.gov/pubmed/35885508 http://dx.doi.org/10.3390/diagnostics12071604 |
_version_ | 1784754842454982656 |
---|---|
author | Syed, Asif Hassan Khan, Tabrej Alromema, Nashwan |
author_facet | Syed, Asif Hassan Khan, Tabrej Alromema, Nashwan |
author_sort | Syed, Asif Hassan |
collection | PubMed |
description | The increase in coronavirus disease 2019 (COVID-19) infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed pressure on healthcare services worldwide. Therefore, it is crucial to identify critical factors for the assessment of the severity of COVID-19 infection and the optimization of an individual treatment strategy. In this regard, the present study leverages a dataset of blood samples from 485 COVID-19 individuals in the region of Wuhan, China to identify essential blood biomarkers that predict the mortality of COVID-19 individuals. For this purpose, a hybrid of filter, statistical, and heuristic-based feature selection approach was used to select the best subset of informative features. As a result, minimum redundancy maximum relevance (mRMR), a two-tailed unpaired t-test, and whale optimization algorithm (WOA) were eventually selected as the three most informative blood biomarkers: International normalized ratio (INR), platelet large cell ratio (P-LCR), and D-dimer. In addition, various machine learning (ML) algorithms (random forest (RF), support vector machine (SVM), extreme gradient boosting (EGB), naïve Bayes (NB), logistic regression (LR), and k-nearest neighbor (KNN)) were trained. The performance of the trained models was compared to determine the model that assist in predicting the mortality of COVID-19 individuals with higher accuracy, F1 score, and area under the curve (AUC) values. In this paper, the best performing RF-based model built using the three most informative blood parameters predicts the mortality of COVID-19 individuals with an accuracy of 0.96 ± 0.062, F1 score of 0.96 ± 0.099, and AUC value of 0.98 ± 0.024, respectively on the independent test data. Furthermore, the performance of our proposed RF-based model in terms of accuracy, F1 score, and AUC was significantly better than the known blood biomarkers-based ML models built using the Pre_Surv_COVID_19 data. Therefore, the present study provides a novel hybrid approach to screen the most informative blood biomarkers to develop an RF-based model, which accurately and reliably predicts in-hospital mortality of confirmed COVID-19 individuals, during surge periods. An application based on our proposed model was implemented and deployed at Heroku. |
format | Online Article Text |
id | pubmed-9316550 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93165502022-07-27 A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction Syed, Asif Hassan Khan, Tabrej Alromema, Nashwan Diagnostics (Basel) Article The increase in coronavirus disease 2019 (COVID-19) infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has placed pressure on healthcare services worldwide. Therefore, it is crucial to identify critical factors for the assessment of the severity of COVID-19 infection and the optimization of an individual treatment strategy. In this regard, the present study leverages a dataset of blood samples from 485 COVID-19 individuals in the region of Wuhan, China to identify essential blood biomarkers that predict the mortality of COVID-19 individuals. For this purpose, a hybrid of filter, statistical, and heuristic-based feature selection approach was used to select the best subset of informative features. As a result, minimum redundancy maximum relevance (mRMR), a two-tailed unpaired t-test, and whale optimization algorithm (WOA) were eventually selected as the three most informative blood biomarkers: International normalized ratio (INR), platelet large cell ratio (P-LCR), and D-dimer. In addition, various machine learning (ML) algorithms (random forest (RF), support vector machine (SVM), extreme gradient boosting (EGB), naïve Bayes (NB), logistic regression (LR), and k-nearest neighbor (KNN)) were trained. The performance of the trained models was compared to determine the model that assist in predicting the mortality of COVID-19 individuals with higher accuracy, F1 score, and area under the curve (AUC) values. In this paper, the best performing RF-based model built using the three most informative blood parameters predicts the mortality of COVID-19 individuals with an accuracy of 0.96 ± 0.062, F1 score of 0.96 ± 0.099, and AUC value of 0.98 ± 0.024, respectively on the independent test data. Furthermore, the performance of our proposed RF-based model in terms of accuracy, F1 score, and AUC was significantly better than the known blood biomarkers-based ML models built using the Pre_Surv_COVID_19 data. Therefore, the present study provides a novel hybrid approach to screen the most informative blood biomarkers to develop an RF-based model, which accurately and reliably predicts in-hospital mortality of confirmed COVID-19 individuals, during surge periods. An application based on our proposed model was implemented and deployed at Heroku. MDPI 2022-06-30 /pmc/articles/PMC9316550/ /pubmed/35885508 http://dx.doi.org/10.3390/diagnostics12071604 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Syed, Asif Hassan Khan, Tabrej Alromema, Nashwan A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction |
title | A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction |
title_full | A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction |
title_fullStr | A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction |
title_full_unstemmed | A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction |
title_short | A Hybrid Feature Selection Approach to Screen a Novel Set of Blood Biomarkers for Early COVID-19 Mortality Prediction |
title_sort | hybrid feature selection approach to screen a novel set of blood biomarkers for early covid-19 mortality prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9316550/ https://www.ncbi.nlm.nih.gov/pubmed/35885508 http://dx.doi.org/10.3390/diagnostics12071604 |
work_keys_str_mv | AT syedasifhassan ahybridfeatureselectionapproachtoscreenanovelsetofbloodbiomarkersforearlycovid19mortalityprediction AT khantabrej ahybridfeatureselectionapproachtoscreenanovelsetofbloodbiomarkersforearlycovid19mortalityprediction AT alromemanashwan ahybridfeatureselectionapproachtoscreenanovelsetofbloodbiomarkersforearlycovid19mortalityprediction AT syedasifhassan hybridfeatureselectionapproachtoscreenanovelsetofbloodbiomarkersforearlycovid19mortalityprediction AT khantabrej hybridfeatureselectionapproachtoscreenanovelsetofbloodbiomarkersforearlycovid19mortalityprediction AT alromemanashwan hybridfeatureselectionapproachtoscreenanovelsetofbloodbiomarkersforearlycovid19mortalityprediction |