Cargando…
Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis
BACKGROUND AND AIMS: This study aimed to develop an interpretable random forest model for predicting severe acute pancreatitis (SAP). METHODS: Clinical and laboratory data of 648 patients with acute pancreatitis were retrospectively reviewed and randomly assigned to the training set and test set in...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9226542/ https://www.ncbi.nlm.nih.gov/pubmed/35755843 http://dx.doi.org/10.3389/fcimb.2022.893294 |
_version_ | 1784733924367269888 |
---|---|
author | Hong, Wandong Lu, Yajing Zhou, Xiaoying Jin, Shengchun Pan, Jingyi Lin, Qingyi Yang, Shaopeng Basharat, Zarrin Zippi, Maddalena Goyal, Hemant |
author_facet | Hong, Wandong Lu, Yajing Zhou, Xiaoying Jin, Shengchun Pan, Jingyi Lin, Qingyi Yang, Shaopeng Basharat, Zarrin Zippi, Maddalena Goyal, Hemant |
author_sort | Hong, Wandong |
collection | PubMed |
description | BACKGROUND AND AIMS: This study aimed to develop an interpretable random forest model for predicting severe acute pancreatitis (SAP). METHODS: Clinical and laboratory data of 648 patients with acute pancreatitis were retrospectively reviewed and randomly assigned to the training set and test set in a 3:1 ratio. Univariate analysis was used to select candidate predictors for the SAP. Random forest (RF) and logistic regression (LR) models were developed on the training sample. The prediction models were then applied to the test sample. The performance of the risk models was measured by calculating the area under the receiver operating characteristic (ROC) curves (AUC) and area under precision recall curve. We provide visualized interpretation by using local interpretable model-agnostic explanations (LIME). RESULTS: The LR model was developed to predict SAP as the following function: -1.10-0.13×albumin (g/L) + 0.016 × serum creatinine (μmol/L) + 0.14 × glucose (mmol/L) + 1.63 × pleural effusion (0/1)(No/Yes). The coefficients of this formula were utilized to build a nomogram. The RF model consists of 16 variables identified by univariate analysis. It was developed and validated by a tenfold cross-validation on the training sample. Variables importance analysis suggested that blood urea nitrogen, serum creatinine, albumin, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, calcium, and glucose were the most important seven predictors of SAP. The AUCs of RF model in tenfold cross-validation of the training set and the test set was 0.89 and 0.96, respectively. Both the area under precision recall curve and the diagnostic accuracy of the RF model were higher than that of both the LR model and the BISAP score. LIME plots were used to explain individualized prediction of the RF model. CONCLUSIONS: An interpretable RF model exhibited the highest discriminatory performance in predicting SAP. Interpretation with LIME plots could be useful for individualized prediction in a clinical setting. A nomogram consisting of albumin, serum creatinine, glucose, and pleural effusion was useful for prediction of SAP. |
format | Online Article Text |
id | pubmed-9226542 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92265422022-06-25 Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis Hong, Wandong Lu, Yajing Zhou, Xiaoying Jin, Shengchun Pan, Jingyi Lin, Qingyi Yang, Shaopeng Basharat, Zarrin Zippi, Maddalena Goyal, Hemant Front Cell Infect Microbiol Cellular and Infection Microbiology BACKGROUND AND AIMS: This study aimed to develop an interpretable random forest model for predicting severe acute pancreatitis (SAP). METHODS: Clinical and laboratory data of 648 patients with acute pancreatitis were retrospectively reviewed and randomly assigned to the training set and test set in a 3:1 ratio. Univariate analysis was used to select candidate predictors for the SAP. Random forest (RF) and logistic regression (LR) models were developed on the training sample. The prediction models were then applied to the test sample. The performance of the risk models was measured by calculating the area under the receiver operating characteristic (ROC) curves (AUC) and area under precision recall curve. We provide visualized interpretation by using local interpretable model-agnostic explanations (LIME). RESULTS: The LR model was developed to predict SAP as the following function: -1.10-0.13×albumin (g/L) + 0.016 × serum creatinine (μmol/L) + 0.14 × glucose (mmol/L) + 1.63 × pleural effusion (0/1)(No/Yes). The coefficients of this formula were utilized to build a nomogram. The RF model consists of 16 variables identified by univariate analysis. It was developed and validated by a tenfold cross-validation on the training sample. Variables importance analysis suggested that blood urea nitrogen, serum creatinine, albumin, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, calcium, and glucose were the most important seven predictors of SAP. The AUCs of RF model in tenfold cross-validation of the training set and the test set was 0.89 and 0.96, respectively. Both the area under precision recall curve and the diagnostic accuracy of the RF model were higher than that of both the LR model and the BISAP score. LIME plots were used to explain individualized prediction of the RF model. CONCLUSIONS: An interpretable RF model exhibited the highest discriminatory performance in predicting SAP. Interpretation with LIME plots could be useful for individualized prediction in a clinical setting. A nomogram consisting of albumin, serum creatinine, glucose, and pleural effusion was useful for prediction of SAP. Frontiers Media S.A. 2022-06-10 /pmc/articles/PMC9226542/ /pubmed/35755843 http://dx.doi.org/10.3389/fcimb.2022.893294 Text en Copyright © 2022 Hong, Lu, Zhou, Jin, Pan, Lin, Yang, Basharat, Zippi and Goyal https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Cellular and Infection Microbiology Hong, Wandong Lu, Yajing Zhou, Xiaoying Jin, Shengchun Pan, Jingyi Lin, Qingyi Yang, Shaopeng Basharat, Zarrin Zippi, Maddalena Goyal, Hemant Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis |
title | Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis |
title_full | Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis |
title_fullStr | Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis |
title_full_unstemmed | Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis |
title_short | Usefulness of Random Forest Algorithm in Predicting Severe Acute Pancreatitis |
title_sort | usefulness of random forest algorithm in predicting severe acute pancreatitis |
topic | Cellular and Infection Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9226542/ https://www.ncbi.nlm.nih.gov/pubmed/35755843 http://dx.doi.org/10.3389/fcimb.2022.893294 |
work_keys_str_mv | AT hongwandong usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT luyajing usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT zhouxiaoying usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT jinshengchun usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT panjingyi usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT linqingyi usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT yangshaopeng usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT basharatzarrin usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT zippimaddalena usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis AT goyalhemant usefulnessofrandomforestalgorithminpredictingsevereacutepancreatitis |