Cargando…

Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment

INTRODUCTION: The COVID-19 patients in the convalescent stage noticeably have pulmonary diffusing capacity impairment (PDCI). The pulmonary diffusing capacity is a frequently-used indicator of the COVID-19 survivors’ prognosis of pulmonary function, but the current studies focusing on prediction of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Fu-qiang, He, Cong, Yang, Hao-ran, Hu, Zuo-wei, Mao, He-rong, Fan, Cun-yu, Qi, Yu, Zhang, Ji-xian, Xu, Bo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10466769/
https://www.ncbi.nlm.nih.gov/pubmed/37644543
http://dx.doi.org/10.1186/s12911-023-02192-6
_version_ 1785098962641879040
author Ma, Fu-qiang
He, Cong
Yang, Hao-ran
Hu, Zuo-wei
Mao, He-rong
Fan, Cun-yu
Qi, Yu
Zhang, Ji-xian
Xu, Bo
author_facet Ma, Fu-qiang
He, Cong
Yang, Hao-ran
Hu, Zuo-wei
Mao, He-rong
Fan, Cun-yu
Qi, Yu
Zhang, Ji-xian
Xu, Bo
author_sort Ma, Fu-qiang
collection PubMed
description INTRODUCTION: The COVID-19 patients in the convalescent stage noticeably have pulmonary diffusing capacity impairment (PDCI). The pulmonary diffusing capacity is a frequently-used indicator of the COVID-19 survivors’ prognosis of pulmonary function, but the current studies focusing on prediction of the pulmonary diffusing capacity of these people are limited. The aim of this study was to develop and validate a machine learning (ML) model for predicting PDCI in the COVID-19 patients using routinely available clinical data, thus assisting the clinical diagnosis. METHODS: Collected from a follow-up study from August to September 2021 of 221 hospitalized survivors of COVID-19 18 months after discharge from Wuhan, including the demographic characteristics and clinical examination, the data in this study were randomly separated into a training (80%) data set and a validation (20%) data set. Six popular machine learning models were developed to predict the pulmonary diffusing capacity of patients infected with COVID-19 in the recovery stage. The performance indicators of the model included area under the curve (AUC), Accuracy, Recall, Precision, Positive Predictive Value(PPV), Negative Predictive Value (NPV) and F1. The model with the optimum performance was defined as the optimal model, which was further employed in the interpretability analysis. The MAHAKIL method was utilized to balance the data and optimize the balance of sample distribution, while the RFECV method for feature selection was utilized to select combined features more favorable to machine learning. RESULTS: A total of 221 COVID-19 survivors were recruited in this study after discharge from hospitals in Wuhan. Of these participants, 117 (52.94%) were female, with a median age of 58.2 years (standard deviation (SD) = 12). After feature selection, 31 of the 37 clinical factors were finally selected for use in constructing the model. Among the six tested ML models, the best performance was accomplished in the XGBoost model, with an AUC of 0.755 and an accuracy of 78.01% after experimental verification. The SHAPELY Additive explanations (SHAP) summary analysis exhibited that hemoglobin (Hb), maximal voluntary ventilation (MVV), severity of illness, platelet (PLT), Uric Acid (UA) and blood urea nitrogen (BUN) were the top six most important factors affecting the XGBoost model decision-making. CONCLUSION: The XGBoost model reported here showed a good prognostic prediction ability for PDCI of COVID-19 survivors during the recovery period. Among the interpretation methods based on the importance of SHAP values, Hb and MVV contributed the most to the prediction of PDCI outcomes of COVID-19 survivors in the recovery period. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02192-6.
format Online
Article
Text
id pubmed-10466769
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104667692023-08-31 Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment Ma, Fu-qiang He, Cong Yang, Hao-ran Hu, Zuo-wei Mao, He-rong Fan, Cun-yu Qi, Yu Zhang, Ji-xian Xu, Bo BMC Med Inform Decis Mak Research INTRODUCTION: The COVID-19 patients in the convalescent stage noticeably have pulmonary diffusing capacity impairment (PDCI). The pulmonary diffusing capacity is a frequently-used indicator of the COVID-19 survivors’ prognosis of pulmonary function, but the current studies focusing on prediction of the pulmonary diffusing capacity of these people are limited. The aim of this study was to develop and validate a machine learning (ML) model for predicting PDCI in the COVID-19 patients using routinely available clinical data, thus assisting the clinical diagnosis. METHODS: Collected from a follow-up study from August to September 2021 of 221 hospitalized survivors of COVID-19 18 months after discharge from Wuhan, including the demographic characteristics and clinical examination, the data in this study were randomly separated into a training (80%) data set and a validation (20%) data set. Six popular machine learning models were developed to predict the pulmonary diffusing capacity of patients infected with COVID-19 in the recovery stage. The performance indicators of the model included area under the curve (AUC), Accuracy, Recall, Precision, Positive Predictive Value(PPV), Negative Predictive Value (NPV) and F1. The model with the optimum performance was defined as the optimal model, which was further employed in the interpretability analysis. The MAHAKIL method was utilized to balance the data and optimize the balance of sample distribution, while the RFECV method for feature selection was utilized to select combined features more favorable to machine learning. RESULTS: A total of 221 COVID-19 survivors were recruited in this study after discharge from hospitals in Wuhan. Of these participants, 117 (52.94%) were female, with a median age of 58.2 years (standard deviation (SD) = 12). After feature selection, 31 of the 37 clinical factors were finally selected for use in constructing the model. Among the six tested ML models, the best performance was accomplished in the XGBoost model, with an AUC of 0.755 and an accuracy of 78.01% after experimental verification. The SHAPELY Additive explanations (SHAP) summary analysis exhibited that hemoglobin (Hb), maximal voluntary ventilation (MVV), severity of illness, platelet (PLT), Uric Acid (UA) and blood urea nitrogen (BUN) were the top six most important factors affecting the XGBoost model decision-making. CONCLUSION: The XGBoost model reported here showed a good prognostic prediction ability for PDCI of COVID-19 survivors during the recovery period. Among the interpretation methods based on the importance of SHAP values, Hb and MVV contributed the most to the prediction of PDCI outcomes of COVID-19 survivors in the recovery period. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02192-6. BioMed Central 2023-08-29 /pmc/articles/PMC10466769/ /pubmed/37644543 http://dx.doi.org/10.1186/s12911-023-02192-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ma, Fu-qiang
He, Cong
Yang, Hao-ran
Hu, Zuo-wei
Mao, He-rong
Fan, Cun-yu
Qi, Yu
Zhang, Ji-xian
Xu, Bo
Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment
title Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment
title_full Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment
title_fullStr Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment
title_full_unstemmed Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment
title_short Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment
title_sort interpretable machine-learning model for predicting the convalescent covid-19 patients with pulmonary diffusing capacity impairment
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10466769/
https://www.ncbi.nlm.nih.gov/pubmed/37644543
http://dx.doi.org/10.1186/s12911-023-02192-6
work_keys_str_mv AT mafuqiang interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT hecong interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT yanghaoran interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT huzuowei interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT maoherong interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT fancunyu interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT qiyu interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT zhangjixian interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment
AT xubo interpretablemachinelearningmodelforpredictingtheconvalescentcovid19patientswithpulmonarydiffusingcapacityimpairment