Cargando…

Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors

OBJECTIVES: Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. METHODS: We obtained the daily number of respiratory disease patients...

Descripción completa

Detalles Bibliográficos
Autores principales: Ku, Yunseo, Kwon, Soon Bin, Yoon, Jeong-Hwa, Mun, Seog-Kyun, Chang, Munyoung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korean Society of Otorhinolaryngology-Head and Neck Surgery 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149237/
https://www.ncbi.nlm.nih.gov/pubmed/34990536
http://dx.doi.org/10.21053/ceo.2021.01536
_version_ 1784717164941410304
author Ku, Yunseo
Kwon, Soon Bin
Yoon, Jeong-Hwa
Mun, Seog-Kyun
Chang, Munyoung
author_facet Ku, Yunseo
Kwon, Soon Bin
Yoon, Jeong-Hwa
Mun, Seog-Kyun
Chang, Munyoung
author_sort Ku, Yunseo
collection PubMed
description OBJECTIVES: Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. METHODS: We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the relief-based feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R(2)) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model. RESULTS: Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R(2) and RMSE were 0.68 and 13.8, respectively. For GPR, the R(2) and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO(2)), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 μm in aerodynamic diameter (PM(2.5)) increased the number of respiratory disease patients. CONCLUSION: We successfully developed models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. These models could evolve into public warning systems.
format Online
Article
Text
id pubmed-9149237
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Korean Society of Otorhinolaryngology-Head and Neck Surgery
record_format MEDLINE/PubMed
spelling pubmed-91492372022-06-01 Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors Ku, Yunseo Kwon, Soon Bin Yoon, Jeong-Hwa Mun, Seog-Kyun Chang, Munyoung Clin Exp Otorhinolaryngol Original Article OBJECTIVES: Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. METHODS: We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the relief-based feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R(2)) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model. RESULTS: Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R(2) and RMSE were 0.68 and 13.8, respectively. For GPR, the R(2) and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO(2)), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 μm in aerodynamic diameter (PM(2.5)) increased the number of respiratory disease patients. CONCLUSION: We successfully developed models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. These models could evolve into public warning systems. Korean Society of Otorhinolaryngology-Head and Neck Surgery 2022-05 2022-01-07 /pmc/articles/PMC9149237/ /pubmed/34990536 http://dx.doi.org/10.21053/ceo.2021.01536 Text en Copyright © 2022 by Korean Society of Otorhinolaryngology-Head and Neck Surgery https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0 (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Ku, Yunseo
Kwon, Soon Bin
Yoon, Jeong-Hwa
Mun, Seog-Kyun
Chang, Munyoung
Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
title Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
title_full Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
title_fullStr Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
title_full_unstemmed Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
title_short Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
title_sort machine learning models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149237/
https://www.ncbi.nlm.nih.gov/pubmed/34990536
http://dx.doi.org/10.21053/ceo.2021.01536
work_keys_str_mv AT kuyunseo machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors
AT kwonsoonbin machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors
AT yoonjeonghwa machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors
AT munseogkyun machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors
AT changmunyoung machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors