Cargando…

XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction

Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most si...

Descripción completa

Detalles Bibliográficos
Autores principales: Davagdorj, Khishigsuren, Pham, Van Huy, Theera-Umpon, Nipon, Ryu, Keun Ho
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7558165/
https://www.ncbi.nlm.nih.gov/pubmed/32906777
http://dx.doi.org/10.3390/ijerph17186513
_version_ 1783594579769950208
author Davagdorj, Khishigsuren
Pham, Van Huy
Theera-Umpon, Nipon
Ryu, Keun Ho
author_facet Davagdorj, Khishigsuren
Pham, Van Huy
Theera-Umpon, Nipon
Ryu, Keun Ho
author_sort Davagdorj, Khishigsuren
collection PubMed
description Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns.
format Online
Article
Text
id pubmed-7558165
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75581652020-10-29 XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction Davagdorj, Khishigsuren Pham, Van Huy Theera-Umpon, Nipon Ryu, Keun Ho Int J Environ Res Public Health Article Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns. MDPI 2020-09-07 2020-09 /pmc/articles/PMC7558165/ /pubmed/32906777 http://dx.doi.org/10.3390/ijerph17186513 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Davagdorj, Khishigsuren
Pham, Van Huy
Theera-Umpon, Nipon
Ryu, Keun Ho
XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
title XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
title_full XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
title_fullStr XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
title_full_unstemmed XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
title_short XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
title_sort xgboost-based framework for smoking-induced noncommunicable disease prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7558165/
https://www.ncbi.nlm.nih.gov/pubmed/32906777
http://dx.doi.org/10.3390/ijerph17186513
work_keys_str_mv AT davagdorjkhishigsuren xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction
AT phamvanhuy xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction
AT theeraumponnipon xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction
AT ryukeunho xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction