Cargando…

Explaining machine learning based diagnosis of COVID-19 from routine blood tests with decision trees and criteria graphs

The sudden outbreak of coronavirus disease 2019 (COVID-19) revealed the need for fast and reliable automatic tools to help health teams. This paper aims to present understandable solutions based on Machine Learning (ML) techniques to deal with COVID-19 screening in routine blood tests. We tested dif...

Descripción completa

Detalles Bibliográficos
Autores principales: Alves, Marcos Antonio, Castro, Giulia Zanon, Oliveira, Bruno Alberto Soares, Ferreira, Leonardo Augusto, Ramírez, Jaime Arturo, Silva, Rodrigo, Guimarães, Frederico Gadelha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7962588/
https://www.ncbi.nlm.nih.gov/pubmed/33812263
http://dx.doi.org/10.1016/j.compbiomed.2021.104335
Descripción
Sumario:The sudden outbreak of coronavirus disease 2019 (COVID-19) revealed the need for fast and reliable automatic tools to help health teams. This paper aims to present understandable solutions based on Machine Learning (ML) techniques to deal with COVID-19 screening in routine blood tests. We tested different ML classifiers in a public dataset from the Hospital Albert Einstein, São Paulo, Brazil. After cleaning and pre-processing the data has 608 patients, of which 84 are positive for COVID-19 confirmed by RT-PCR. To understand the model decisions, we introduce (i) a local Decision Tree Explainer (DTX) for local explanation and (ii) a Criteria Graph to aggregate these explanations and portrait a global picture of the results. Random Forest (RF) classifier achieved the best results (accuracy 0.88, F1–score 0.76, sensitivity 0.66, specificity 0.91, and AUROC 0.86). By using DTX and Criteria Graph for cases confirmed by the RF, it was possible to find some patterns among the individuals able to aid the clinicians to understand the interconnection among the blood parameters either globally or on a case-by-case basis. The results are in accordance with the literature and the proposed methodology may be embedded in an electronic health record system.