Cargando…
Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors
OBJECTIVES: Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. METHODS: We obtained the daily number of respiratory disease patients...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Korean Society of Otorhinolaryngology-Head and Neck Surgery
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149237/ https://www.ncbi.nlm.nih.gov/pubmed/34990536 http://dx.doi.org/10.21053/ceo.2021.01536 |
_version_ | 1784717164941410304 |
---|---|
author | Ku, Yunseo Kwon, Soon Bin Yoon, Jeong-Hwa Mun, Seog-Kyun Chang, Munyoung |
author_facet | Ku, Yunseo Kwon, Soon Bin Yoon, Jeong-Hwa Mun, Seog-Kyun Chang, Munyoung |
author_sort | Ku, Yunseo |
collection | PubMed |
description | OBJECTIVES: Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. METHODS: We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the relief-based feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R(2)) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model. RESULTS: Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R(2) and RMSE were 0.68 and 13.8, respectively. For GPR, the R(2) and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO(2)), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 μm in aerodynamic diameter (PM(2.5)) increased the number of respiratory disease patients. CONCLUSION: We successfully developed models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. These models could evolve into public warning systems. |
format | Online Article Text |
id | pubmed-9149237 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Korean Society of Otorhinolaryngology-Head and Neck Surgery |
record_format | MEDLINE/PubMed |
spelling | pubmed-91492372022-06-01 Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors Ku, Yunseo Kwon, Soon Bin Yoon, Jeong-Hwa Mun, Seog-Kyun Chang, Munyoung Clin Exp Otorhinolaryngol Original Article OBJECTIVES: Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases. METHODS: We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the relief-based feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R(2)) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model. RESULTS: Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R(2) and RMSE were 0.68 and 13.8, respectively. For GPR, the R(2) and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO(2)), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 μm in aerodynamic diameter (PM(2.5)) increased the number of respiratory disease patients. CONCLUSION: We successfully developed models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. These models could evolve into public warning systems. Korean Society of Otorhinolaryngology-Head and Neck Surgery 2022-05 2022-01-07 /pmc/articles/PMC9149237/ /pubmed/34990536 http://dx.doi.org/10.21053/ceo.2021.01536 Text en Copyright © 2022 by Korean Society of Otorhinolaryngology-Head and Neck Surgery https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0 (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Ku, Yunseo Kwon, Soon Bin Yoon, Jeong-Hwa Mun, Seog-Kyun Chang, Munyoung Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors |
title | Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors |
title_full | Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors |
title_fullStr | Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors |
title_full_unstemmed | Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors |
title_short | Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors |
title_sort | machine learning models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149237/ https://www.ncbi.nlm.nih.gov/pubmed/34990536 http://dx.doi.org/10.21053/ceo.2021.01536 |
work_keys_str_mv | AT kuyunseo machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors AT kwonsoonbin machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors AT yoonjeonghwa machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors AT munseogkyun machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors AT changmunyoung machinelearningmodelsforpredictingtheoccurrenceofrespiratorydiseasesusingclimaticandairpollutionfactors |