Cargando…

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study

BACKGROUND: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ikemura, Kenji, Bellin, Eran, Yagi, Yukako, Billett, Henny, Saada, Mahmoud, Simone, Katelyn, Stahl, Lindsay, Szymanski, James, Goldstein, D Y, Reyes Gil, Morayma
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7919846/ https://www.ncbi.nlm.nih.gov/pubmed/33539308 http://dx.doi.org/10.2196/23458

_version_	1783658195449806848
author	Ikemura, Kenji Bellin, Eran Yagi, Yukako Billett, Henny Saada, Mahmoud Simone, Katelyn Stahl, Lindsay Szymanski, James Goldstein, D Y Reyes Gil, Morayma
author_facet	Ikemura, Kenji Bellin, Eran Yagi, Yukako Billett, Henny Saada, Mahmoud Simone, Katelyn Stahl, Lindsay Szymanski, James Goldstein, D Y Reyes Gil, Morayma
author_sort	Ikemura, Kenji
collection	PubMed
description	BACKGROUND: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. OBJECTIVE: In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients’ chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. METHODS: Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients’ data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. RESULTS: Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). CONCLUSIONS: We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning–based clinical decision support tools.
format	Online Article Text
id	pubmed-7919846
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-79198462021-03-05 Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study Ikemura, Kenji Bellin, Eran Yagi, Yukako Billett, Henny Saada, Mahmoud Simone, Katelyn Stahl, Lindsay Szymanski, James Goldstein, D Y Reyes Gil, Morayma J Med Internet Res Original Paper BACKGROUND: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. OBJECTIVE: In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients’ chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. METHODS: Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients’ data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. RESULTS: Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). CONCLUSIONS: We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning–based clinical decision support tools. JMIR Publications 2021-02-26 /pmc/articles/PMC7919846/ /pubmed/33539308 http://dx.doi.org/10.2196/23458 Text en ©Kenji Ikemura, Eran Bellin, Yukako Yagi, Henny Billett, Mahmoud Saada, Katelyn Simone, Lindsay Stahl, James Szymanski, D Y Goldstein, Morayma Reyes Gil. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 26.02.2021. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Ikemura, Kenji Bellin, Eran Yagi, Yukako Billett, Henny Saada, Mahmoud Simone, Katelyn Stahl, Lindsay Szymanski, James Goldstein, D Y Reyes Gil, Morayma Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study
title	Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study
title_full	Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study
title_fullStr	Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study
title_full_unstemmed	Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study
title_short	Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study
title_sort	using automated machine learning to predict the mortality of patients with covid-19: prediction model development study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7919846/ https://www.ncbi.nlm.nih.gov/pubmed/33539308 http://dx.doi.org/10.2196/23458
work_keys_str_mv	AT ikemurakenji usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT bellineran usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT yagiyukako usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT billetthenny usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT saadamahmoud usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT simonekatelyn usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT stahllindsay usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT szymanskijames usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT goldsteindy usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy AT reyesgilmorayma usingautomatedmachinelearningtopredictthemortalityofpatientswithcovid19predictionmodeldevelopmentstudy

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study

Ejemplares similares