Cargando…

Generalizable prediction of COVID-19 mortality on worldwide patient data

OBJECTIVE: Predicting Coronavirus disease 2019 (COVID-19) mortality for patients is critical for early-stage care and intervention. Existing studies mainly built models on datasets with limited geographical range or size. In this study, we developed COVID-19 mortality prediction models on worldwide,...

Descripción completa

Detalles Bibliográficos
Autores principales: Edelson, Maxim, Kuo, Tsung-Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129227/
https://www.ncbi.nlm.nih.gov/pubmed/35663116
http://dx.doi.org/10.1093/jamiaopen/ooac036
Descripción
Sumario:OBJECTIVE: Predicting Coronavirus disease 2019 (COVID-19) mortality for patients is critical for early-stage care and intervention. Existing studies mainly built models on datasets with limited geographical range or size. In this study, we developed COVID-19 mortality prediction models on worldwide, large-scale “sparse” data and on a “dense” subset of the data. MATERIALS AND METHODS: We evaluated 6 classifiers, including logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), AdaBoost (AB), and Naive Bayes (NB). We also conducted temporal analysis and calibrated our models using Isotonic Regression. RESULTS: The results showed that AB outperformed the other classifiers for the sparse dataset, while LR provided the highest-performing results for the dense dataset (with area under the receiver operating characteristic curve, or AUC ≈ 0.7 for the sparse dataset and AUC = 0.963 for the dense one). We also identified impactful features such as symptoms, countries, age, and the date of death/discharge. All our models are well-calibrated (P > .1). DISCUSSION: Our results highlight the tradeoff of using sparse training data to increase generalizability versus training on denser data, which produces higher discrimination results. We found that covariates such as patient information on symptoms, countries (where the case was reported), age, and the date of discharge from the hospital or death were the most important for mortality prediction. CONCLUSION: This study is a stepping-stone towards improving healthcare quality during the COVID-19 era and potentially other pandemics. Our code is publicly available at: https://doi.org/10.5281/zenodo.6336231.