Cargando…
An explainable artificial intelligence framework for risk prediction of COPD in smokers
BACKGROUND: Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial in...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626705/ https://www.ncbi.nlm.nih.gov/pubmed/37932692 http://dx.doi.org/10.1186/s12889-023-17011-w |
_version_ | 1785131392513867776 |
---|---|
author | Wang, Xuchun Qiao, Yuchao Cui, Yu Ren, Hao Zhao, Ying Linghu, Liqin Ren, Jiahui Zhao, Zhiyang Chen, Limin Qiu, Lixia |
author_facet | Wang, Xuchun Qiao, Yuchao Cui, Yu Ren, Hao Zhao, Ying Linghu, Liqin Ren, Jiahui Zhao, Zhiyang Chen, Limin Qiu, Lixia |
author_sort | Wang, Xuchun |
collection | PubMed |
description | BACKGROUND: Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions. METHODS: The data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model’s decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP). RESULTS: In the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population. CONCLUSION: This study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-023-17011-w. |
format | Online Article Text |
id | pubmed-10626705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106267052023-11-07 An explainable artificial intelligence framework for risk prediction of COPD in smokers Wang, Xuchun Qiao, Yuchao Cui, Yu Ren, Hao Zhao, Ying Linghu, Liqin Ren, Jiahui Zhao, Zhiyang Chen, Limin Qiu, Lixia BMC Public Health Research BACKGROUND: Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions. METHODS: The data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model’s decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP). RESULTS: In the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population. CONCLUSION: This study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-023-17011-w. BioMed Central 2023-11-06 /pmc/articles/PMC10626705/ /pubmed/37932692 http://dx.doi.org/10.1186/s12889-023-17011-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Xuchun Qiao, Yuchao Cui, Yu Ren, Hao Zhao, Ying Linghu, Liqin Ren, Jiahui Zhao, Zhiyang Chen, Limin Qiu, Lixia An explainable artificial intelligence framework for risk prediction of COPD in smokers |
title | An explainable artificial intelligence framework for risk prediction of COPD in smokers |
title_full | An explainable artificial intelligence framework for risk prediction of COPD in smokers |
title_fullStr | An explainable artificial intelligence framework for risk prediction of COPD in smokers |
title_full_unstemmed | An explainable artificial intelligence framework for risk prediction of COPD in smokers |
title_short | An explainable artificial intelligence framework for risk prediction of COPD in smokers |
title_sort | explainable artificial intelligence framework for risk prediction of copd in smokers |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626705/ https://www.ncbi.nlm.nih.gov/pubmed/37932692 http://dx.doi.org/10.1186/s12889-023-17011-w |
work_keys_str_mv | AT wangxuchun anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT qiaoyuchao anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT cuiyu anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT renhao anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT zhaoying anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT linghuliqin anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT renjiahui anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT zhaozhiyang anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT chenlimin anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT qiulixia anexplainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT wangxuchun explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT qiaoyuchao explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT cuiyu explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT renhao explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT zhaoying explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT linghuliqin explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT renjiahui explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT zhaozhiyang explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT chenlimin explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers AT qiulixia explainableartificialintelligenceframeworkforriskpredictionofcopdinsmokers |