Cargando…

Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting

In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM(2.5) is one of the main components of air pollution. Therefore, it is necessary to know the PM(2.5) air quality in advance for heal...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shih, Dong-Her, To, Thi Hien, Nguyen, Ly Sy Phu, Wu, Ting-Wei, You, Wen-Ting
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8296958/ https://www.ncbi.nlm.nih.gov/pubmed/34281023 http://dx.doi.org/10.3390/ijerph18137087

_version_	1783725748599652352
author	Shih, Dong-Her To, Thi Hien Nguyen, Ly Sy Phu Wu, Ting-Wei You, Wen-Ting
author_facet	Shih, Dong-Her To, Thi Hien Nguyen, Ly Sy Phu Wu, Ting-Wei You, Wen-Ting
author_sort	Shih, Dong-Her
collection	PubMed
description	In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM(2.5) is one of the main components of air pollution. Therefore, it is necessary to know the PM(2.5) air quality in advance for health. Many studies on air quality are based on the government’s official air quality monitoring stations, which cannot be widely deployed due to high cost constraints. Furthermore, the update frequency of government monitoring stations is once an hour, and it is hard to capture short-term PM(2.5) concentration peaks with little warning. Nevertheless, dealing with short-term data with many stations, the volume of data is huge and is calculated, analyzed and predicted in a complex way. This alleviates the high computational requirements of the original predictor, thus making Spark suitable for the considered problem. This study proposes a PM(2.5) instant prediction architecture based on the Spark big data framework to handle the huge data from the LASS community. The Spark big data framework proposed in this study is divided into three modules. It collects real time PM(2.5) data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting Decision Tree) to predict the PM(2.5) concentration value in the next 30 to 180 min with accompanying visualization graph. The experimental results show that our proposed Spark big data ensemble prediction model in next 30-min prediction has the best performance (R(2) up to 0.96), and the ensemble model has better performance than any single machine learning model. Taiwan has been suffering from a situation of relatively poor air pollution quality for a long time. Air pollutant monitoring data from LASS community can provide a wide broader monitoring, however the data is large and difficult to integrate or analyze. The proposed Spark big data framework system can provide short-term PM(2.5) forecasts and help the decision-maker to take proper action immediately.
format	Online Article Text
id	pubmed-8296958
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-82969582021-07-23 Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting Shih, Dong-Her To, Thi Hien Nguyen, Ly Sy Phu Wu, Ting-Wei You, Wen-Ting Int J Environ Res Public Health Article In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM(2.5) is one of the main components of air pollution. Therefore, it is necessary to know the PM(2.5) air quality in advance for health. Many studies on air quality are based on the government’s official air quality monitoring stations, which cannot be widely deployed due to high cost constraints. Furthermore, the update frequency of government monitoring stations is once an hour, and it is hard to capture short-term PM(2.5) concentration peaks with little warning. Nevertheless, dealing with short-term data with many stations, the volume of data is huge and is calculated, analyzed and predicted in a complex way. This alleviates the high computational requirements of the original predictor, thus making Spark suitable for the considered problem. This study proposes a PM(2.5) instant prediction architecture based on the Spark big data framework to handle the huge data from the LASS community. The Spark big data framework proposed in this study is divided into three modules. It collects real time PM(2.5) data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting Decision Tree) to predict the PM(2.5) concentration value in the next 30 to 180 min with accompanying visualization graph. The experimental results show that our proposed Spark big data ensemble prediction model in next 30-min prediction has the best performance (R(2) up to 0.96), and the ensemble model has better performance than any single machine learning model. Taiwan has been suffering from a situation of relatively poor air pollution quality for a long time. Air pollutant monitoring data from LASS community can provide a wide broader monitoring, however the data is large and difficult to integrate or analyze. The proposed Spark big data framework system can provide short-term PM(2.5) forecasts and help the decision-maker to take proper action immediately. MDPI 2021-07-02 /pmc/articles/PMC8296958/ /pubmed/34281023 http://dx.doi.org/10.3390/ijerph18137087 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Shih, Dong-Her To, Thi Hien Nguyen, Ly Sy Phu Wu, Ting-Wei You, Wen-Ting Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting
title	Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting
title_full	Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting
title_fullStr	Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting
title_full_unstemmed	Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting
title_short	Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting
title_sort	design of a spark big data framework for pm(2.5) air pollution forecasting
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8296958/ https://www.ncbi.nlm.nih.gov/pubmed/34281023 http://dx.doi.org/10.3390/ijerph18137087
work_keys_str_mv	AT shihdongher designofasparkbigdataframeworkforpm25airpollutionforecasting AT tothihien designofasparkbigdataframeworkforpm25airpollutionforecasting AT nguyenlysyphu designofasparkbigdataframeworkforpm25airpollutionforecasting AT wutingwei designofasparkbigdataframeworkforpm25airpollutionforecasting AT youwenting designofasparkbigdataframeworkforpm25airpollutionforecasting

Design of a Spark Big Data Framework for PM(2.5) Air Pollution Forecasting

Ejemplares similares