Cargando…

Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

BACKGROUND: Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Feng, Cindy, Kephart, George, Juarez-Colunga, Elizabeth
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627169/ https://www.ncbi.nlm.nih.gov/pubmed/34837951 http://dx.doi.org/10.1186/s12874-021-01441-4

_version_	1784606802134958080
author	Feng, Cindy Kephart, George Juarez-Colunga, Elizabeth
author_facet	Feng, Cindy Kephart, George Juarez-Colunga, Elizabeth
author_sort	Feng, Cindy
collection	PubMed
description	BACKGROUND: Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. METHODS: We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. RESULTS: We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. CONCLUSIONS: XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.
format	Online Article Text
id	pubmed-8627169
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-86271692021-11-29 Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods Feng, Cindy Kephart, George Juarez-Colunga, Elizabeth BMC Med Res Methodol Research BACKGROUND: Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. METHODS: We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. RESULTS: We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. CONCLUSIONS: XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population. BioMed Central 2021-11-27 /pmc/articles/PMC8627169/ /pubmed/34837951 http://dx.doi.org/10.1186/s12874-021-01441-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Feng, Cindy Kephart, George Juarez-Colunga, Elizabeth Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title	Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_full	Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_fullStr	Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_full_unstemmed	Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_short	Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods
title_sort	predicting covid-19 mortality risk in toronto, canada: a comparison of tree-based and regression-based machine learning methods
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627169/ https://www.ncbi.nlm.nih.gov/pubmed/34837951 http://dx.doi.org/10.1186/s12874-021-01441-4
work_keys_str_mv	AT fengcindy predictingcovid19mortalityriskintorontocanadaacomparisonoftreebasedandregressionbasedmachinelearningmethods AT kephartgeorge predictingcovid19mortalityriskintorontocanadaacomparisonoftreebasedandregressionbasedmachinelearningmethods AT juarezcolungaelizabeth predictingcovid19mortalityriskintorontocanadaacomparisonoftreebasedandregressionbasedmachinelearningmethods

Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

Ejemplares similares