Cargando…

Ensemble learning model for diagnosing COVID-19 from routine blood tests

BACKGROUND AND OBJECTIVES: The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spre...

Descripción completa

Detalles Bibliográficos
Autores principales:	AlJame, Maryam, Ahmad, Imtiaz, Imtiaz, Ayyub, Mohammed, Ameer
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	The Author(s). Published by Elsevier Ltd. 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7572278/ https://www.ncbi.nlm.nih.gov/pubmed/33102686 http://dx.doi.org/10.1016/j.imu.2020.100449

_version_	1783597307260829696
author	AlJame, Maryam Ahmad, Imtiaz Imtiaz, Ayyub Mohammed, Ameer
author_facet	AlJame, Maryam Ahmad, Imtiaz Imtiaz, Ayyub Mohammed, Ameer
author_sort	AlJame, Maryam
collection	PubMed
description	BACKGROUND AND OBJECTIVES: The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. METHOD: The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. RESULTS: The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6–100], AUC of 99.38% [95% CI: 97.5–100], a sensitivity of 98.72% [95% CI: 94.6–100] and a specificity of 99.99% [95% CI: 99.99–100]. DISCUSSION: The proposed model revealed better performance when compared against existing state-of-the-art studies (Banerjee et al., 2020; de Freitas Barbosa et al., 2020; de Moraes Batista et al., 2020; Soares et al., 2020) [3,22,56,71] for the same set of features employed by them. As compared to the best performing Bayes Net model (de Freitas Barbosa et al., 2020) [22] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model (de Moraes Batista et al., 2020) [56], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model (Soares et al., 2020) [71] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained a considerably higher score as compared with ANN model (Banerjee et al., 2020) [3] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.
format	Online Article Text
id	pubmed-7572278
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	The Author(s). Published by Elsevier Ltd.
record_format	MEDLINE/PubMed
spelling	pubmed-75722782020-10-20 Ensemble learning model for diagnosing COVID-19 from routine blood tests AlJame, Maryam Ahmad, Imtiaz Imtiaz, Ayyub Mohammed, Ameer Inform Med Unlocked Article BACKGROUND AND OBJECTIVES: The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. METHOD: The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. RESULTS: The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6–100], AUC of 99.38% [95% CI: 97.5–100], a sensitivity of 98.72% [95% CI: 94.6–100] and a specificity of 99.99% [95% CI: 99.99–100]. DISCUSSION: The proposed model revealed better performance when compared against existing state-of-the-art studies (Banerjee et al., 2020; de Freitas Barbosa et al., 2020; de Moraes Batista et al., 2020; Soares et al., 2020) [3,22,56,71] for the same set of features employed by them. As compared to the best performing Bayes Net model (de Freitas Barbosa et al., 2020) [22] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model (de Moraes Batista et al., 2020) [56], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model (Soares et al., 2020) [71] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained a considerably higher score as compared with ANN model (Banerjee et al., 2020) [3] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients. The Author(s). Published by Elsevier Ltd. 2020 2020-10-20 /pmc/articles/PMC7572278/ /pubmed/33102686 http://dx.doi.org/10.1016/j.imu.2020.100449 Text en © 2020 The Author(s) Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle	Article AlJame, Maryam Ahmad, Imtiaz Imtiaz, Ayyub Mohammed, Ameer Ensemble learning model for diagnosing COVID-19 from routine blood tests
title	Ensemble learning model for diagnosing COVID-19 from routine blood tests
title_full	Ensemble learning model for diagnosing COVID-19 from routine blood tests
title_fullStr	Ensemble learning model for diagnosing COVID-19 from routine blood tests
title_full_unstemmed	Ensemble learning model for diagnosing COVID-19 from routine blood tests
title_short	Ensemble learning model for diagnosing COVID-19 from routine blood tests
title_sort	ensemble learning model for diagnosing covid-19 from routine blood tests
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7572278/ https://www.ncbi.nlm.nih.gov/pubmed/33102686 http://dx.doi.org/10.1016/j.imu.2020.100449
work_keys_str_mv	AT aljamemaryam ensemblelearningmodelfordiagnosingcovid19fromroutinebloodtests AT ahmadimtiaz ensemblelearningmodelfordiagnosingcovid19fromroutinebloodtests AT imtiazayyub ensemblelearningmodelfordiagnosingcovid19fromroutinebloodtests AT mohammedameer ensemblelearningmodelfordiagnosingcovid19fromroutinebloodtests

Ensemble learning model for diagnosing COVID-19 from routine blood tests

Ejemplares similares