Cargando…

A descriptive study of random forest algorithm for predicting COVID-19 patients outcome

BACKGROUND: The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients. METHODS: The clinical information from 12...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jie, Yu, Heping, Hua, Qingquan, Jing, Shuili, Liu, Zhifen, Peng, Xiang, Cao, Cheng’an, Luo, Yongwen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7486830/
https://www.ncbi.nlm.nih.gov/pubmed/32974109
http://dx.doi.org/10.7717/peerj.9945
_version_ 1783581386674798592
author Wang, Jie
Yu, Heping
Hua, Qingquan
Jing, Shuili
Liu, Zhifen
Peng, Xiang
Cao, Cheng’an
Luo, Yongwen
author_facet Wang, Jie
Yu, Heping
Hua, Qingquan
Jing, Shuili
Liu, Zhifen
Peng, Xiang
Cao, Cheng’an
Luo, Yongwen
author_sort Wang, Jie
collection PubMed
description BACKGROUND: The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients. METHODS: The clinical information from 126 patients diagnosed with COVID-19 were collected from Wuhan Fourth Hospital. Specific clinical characteristics, laboratory findings, treatments and clinical outcomes were analyzed from patients hospitalized for treatment from 1 February to 15 March 2020, and subsequently died or were discharged. A random forest (RF) algorithm was used to predict the prognoses of COVID-19 patients and identify the optimal diagnostic predictors for patients’ clinical prognoses. RESULTS: Seven of the 126 patients were excluded for losing endpoints, 103 of the remaining 119 patients were discharged (alive) and 16 died in the hospital. A synthetic minority over-sampling technique (SMOTE) was used to correct the imbalanced distribution of clinical patients. Recursive feature elimination (RFE) was used to select the optimal subset for analysis. Eleven clinical parameters, Myo, CD8, age, LDH, LMR, CD45, Th/Ts, dyspnea, NLR, D-Dimer and CK were chosen with AUC approximately 0.9905. The RF algorithm was built to predict the prognoses of COVID-19 patients based on the best subset, and the area under the ROC curve (AUC) of the test data was 100%. Moreover, two optimal clinical risk predictors, lactate dehydrogenase (LDH) and Myoglobin (Myo), were selected based on the Gini index. The univariable logistic analysis revealed a substantial increase in the risk for in-hospital mortality when Myo was higher than 80 ng/ml (OR = 7.54, 95% CI [3.42–16.63]) and LDH was higher than 500 U/L (OR = 4.90, 95% CI [2.13–11.25]). CONCLUSION: We applied an RF algorithm to predict the mortality of COVID-19 patients with high accuracy and identified LDH higher than 500 U/L and Myo higher than 80 ng/ml to be potential risk factors for the prognoses of COVID-19 patients in the early stage of the disease.
format Online
Article
Text
id pubmed-7486830
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-74868302020-09-23 A descriptive study of random forest algorithm for predicting COVID-19 patients outcome Wang, Jie Yu, Heping Hua, Qingquan Jing, Shuili Liu, Zhifen Peng, Xiang Cao, Cheng’an Luo, Yongwen PeerJ Emergency and Critical Care BACKGROUND: The outbreak of coronavirus disease 2019 (COVID-19) that occurred in Wuhan, China, has become a global public health threat. It is necessary to identify indicators that can be used as optimal predictors for clinical outcomes of COVID-19 patients. METHODS: The clinical information from 126 patients diagnosed with COVID-19 were collected from Wuhan Fourth Hospital. Specific clinical characteristics, laboratory findings, treatments and clinical outcomes were analyzed from patients hospitalized for treatment from 1 February to 15 March 2020, and subsequently died or were discharged. A random forest (RF) algorithm was used to predict the prognoses of COVID-19 patients and identify the optimal diagnostic predictors for patients’ clinical prognoses. RESULTS: Seven of the 126 patients were excluded for losing endpoints, 103 of the remaining 119 patients were discharged (alive) and 16 died in the hospital. A synthetic minority over-sampling technique (SMOTE) was used to correct the imbalanced distribution of clinical patients. Recursive feature elimination (RFE) was used to select the optimal subset for analysis. Eleven clinical parameters, Myo, CD8, age, LDH, LMR, CD45, Th/Ts, dyspnea, NLR, D-Dimer and CK were chosen with AUC approximately 0.9905. The RF algorithm was built to predict the prognoses of COVID-19 patients based on the best subset, and the area under the ROC curve (AUC) of the test data was 100%. Moreover, two optimal clinical risk predictors, lactate dehydrogenase (LDH) and Myoglobin (Myo), were selected based on the Gini index. The univariable logistic analysis revealed a substantial increase in the risk for in-hospital mortality when Myo was higher than 80 ng/ml (OR = 7.54, 95% CI [3.42–16.63]) and LDH was higher than 500 U/L (OR = 4.90, 95% CI [2.13–11.25]). CONCLUSION: We applied an RF algorithm to predict the mortality of COVID-19 patients with high accuracy and identified LDH higher than 500 U/L and Myo higher than 80 ng/ml to be potential risk factors for the prognoses of COVID-19 patients in the early stage of the disease. PeerJ Inc. 2020-09-09 /pmc/articles/PMC7486830/ /pubmed/32974109 http://dx.doi.org/10.7717/peerj.9945 Text en © 2020 Wang et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Emergency and Critical Care
Wang, Jie
Yu, Heping
Hua, Qingquan
Jing, Shuili
Liu, Zhifen
Peng, Xiang
Cao, Cheng’an
Luo, Yongwen
A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_full A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_fullStr A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_full_unstemmed A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_short A descriptive study of random forest algorithm for predicting COVID-19 patients outcome
title_sort descriptive study of random forest algorithm for predicting covid-19 patients outcome
topic Emergency and Critical Care
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7486830/
https://www.ncbi.nlm.nih.gov/pubmed/32974109
http://dx.doi.org/10.7717/peerj.9945
work_keys_str_mv AT wangjie adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT yuheping adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT huaqingquan adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT jingshuili adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT liuzhifen adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT pengxiang adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT caochengan adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT luoyongwen adescriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT wangjie descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT yuheping descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT huaqingquan descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT jingshuili descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT liuzhifen descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT pengxiang descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT caochengan descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome
AT luoyongwen descriptivestudyofrandomforestalgorithmforpredictingcovid19patientsoutcome