Cargando…

Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation

BACKGROUND: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain’s health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective tre...

Descripción completa

Detalles Bibliográficos
Autores principales: Domínguez-Olmedo, Juan L, Gragera-Martínez, Álvaro, Mata, Jacinto, Pachón Álvarez, Victoria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8048712/
https://www.ncbi.nlm.nih.gov/pubmed/33793407
http://dx.doi.org/10.2196/26211
_version_ 1783679282871009280
author Domínguez-Olmedo, Juan L
Gragera-Martínez, Álvaro
Mata, Jacinto
Pachón Álvarez, Victoria
author_facet Domínguez-Olmedo, Juan L
Gragera-Martínez, Álvaro
Mata, Jacinto
Pachón Álvarez, Victoria
author_sort Domínguez-Olmedo, Juan L
collection PubMed
description BACKGROUND: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain’s health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. OBJECTIVE: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. METHODS: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. RESULTS: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. CONCLUSIONS: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality.
format Online
Article
Text
id pubmed-8048712
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-80487122021-04-22 Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation Domínguez-Olmedo, Juan L Gragera-Martínez, Álvaro Mata, Jacinto Pachón Álvarez, Victoria J Med Internet Res Original Paper BACKGROUND: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain’s health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. OBJECTIVE: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. METHODS: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. RESULTS: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. CONCLUSIONS: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality. JMIR Publications 2021-04-14 /pmc/articles/PMC8048712/ /pubmed/33793407 http://dx.doi.org/10.2196/26211 Text en ©Juan L Domínguez-Olmedo, Álvaro Gragera-Martínez, Jacinto Mata, Victoria Pachón Álvarez. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 14.04.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Domínguez-Olmedo, Juan L
Gragera-Martínez, Álvaro
Mata, Jacinto
Pachón Álvarez, Victoria
Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation
title Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation
title_full Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation
title_fullStr Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation
title_full_unstemmed Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation
title_short Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation
title_sort machine learning applied to clinical laboratory data in spain for covid-19 outcome prediction: model development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8048712/
https://www.ncbi.nlm.nih.gov/pubmed/33793407
http://dx.doi.org/10.2196/26211
work_keys_str_mv AT dominguezolmedojuanl machinelearningappliedtoclinicallaboratorydatainspainforcovid19outcomepredictionmodeldevelopmentandvalidation
AT grageramartinezalvaro machinelearningappliedtoclinicallaboratorydatainspainforcovid19outcomepredictionmodeldevelopmentandvalidation
AT matajacinto machinelearningappliedtoclinicallaboratorydatainspainforcovid19outcomepredictionmodeldevelopmentandvalidation
AT pachonalvarezvictoria machinelearningappliedtoclinicallaboratorydatainspainforcovid19outcomepredictionmodeldevelopmentandvalidation