Cargando…

Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation

BACKGROUND: Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks. Most prior work relied on modeling pollutant...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Chen, Yousefi, Safoora, Kahoro, Elvis, Karisani, Payam, Liang, Donghai, Sarnat, Jeremy, Agichtein, Eugene
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808603/
https://www.ncbi.nlm.nih.gov/pubmed/36534457
http://dx.doi.org/10.2196/23422
_version_ 1784862968700207104
author Lin, Chen
Yousefi, Safoora
Kahoro, Elvis
Karisani, Payam
Liang, Donghai
Sarnat, Jeremy
Agichtein, Eugene
author_facet Lin, Chen
Yousefi, Safoora
Kahoro, Elvis
Karisani, Payam
Liang, Donghai
Sarnat, Jeremy
Agichtein, Eugene
author_sort Lin, Chen
collection PubMed
description BACKGROUND: Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks. Most prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone (O(3)), oxides of nitrogen, and fine particulate matter (PM(2.5)). Given that traditional, highly sophisticated air quality monitors are expensive and not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built based on physical measurement data collected from sensors, they may not be suitable for predicting the public health effects of pollution exposure. OBJECTIVE: This study aimed to develop and validate models to nowcast the observed pollution levels using web search data, which are publicly available in near real time from major search engines. METHODS: We developed novel machine learning–based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level by using generally available meteorological data and aggregate web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting 3 critical air pollutants (O(3), nitrogen dioxide, and PM(2.5)) across 10 major US metropolitan statistical areas in 2017 and 2018. We also explore different variations of the long short-term memory model and propose a novel search term dictionary learner-long short-term memory model to learn sequential patterns across multiple search terms for prediction. RESULTS: The top-performing model was a deep neural sequence model long short-term memory, using meteorological and web search data, and reached an accuracy of 0.82 (F(1)-score 0.51) for O(3,) 0.74 (F(1)-score 0.41) for nitrogen dioxide, and 0.85 (F(1)-score 0.27) for PM(2.5), when used for detecting elevated pollution levels. Compared with using only meteorological data, the proposed method achieved superior accuracy by incorporating web search data. CONCLUSIONS: The results show that incorporating web search data with meteorological data improves the nowcasting performance for all 3 pollutants and suggest promising novel applications for tracking global physical phenomena using web search data.
format Online
Article
Text
id pubmed-9808603
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-98086032023-01-04 Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation Lin, Chen Yousefi, Safoora Kahoro, Elvis Karisani, Payam Liang, Donghai Sarnat, Jeremy Agichtein, Eugene JMIR Form Res Original Paper BACKGROUND: Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks. Most prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone (O(3)), oxides of nitrogen, and fine particulate matter (PM(2.5)). Given that traditional, highly sophisticated air quality monitors are expensive and not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built based on physical measurement data collected from sensors, they may not be suitable for predicting the public health effects of pollution exposure. OBJECTIVE: This study aimed to develop and validate models to nowcast the observed pollution levels using web search data, which are publicly available in near real time from major search engines. METHODS: We developed novel machine learning–based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level by using generally available meteorological data and aggregate web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting 3 critical air pollutants (O(3), nitrogen dioxide, and PM(2.5)) across 10 major US metropolitan statistical areas in 2017 and 2018. We also explore different variations of the long short-term memory model and propose a novel search term dictionary learner-long short-term memory model to learn sequential patterns across multiple search terms for prediction. RESULTS: The top-performing model was a deep neural sequence model long short-term memory, using meteorological and web search data, and reached an accuracy of 0.82 (F(1)-score 0.51) for O(3,) 0.74 (F(1)-score 0.41) for nitrogen dioxide, and 0.85 (F(1)-score 0.27) for PM(2.5), when used for detecting elevated pollution levels. Compared with using only meteorological data, the proposed method achieved superior accuracy by incorporating web search data. CONCLUSIONS: The results show that incorporating web search data with meteorological data improves the nowcasting performance for all 3 pollutants and suggest promising novel applications for tracking global physical phenomena using web search data. JMIR Publications 2022-12-19 /pmc/articles/PMC9808603/ /pubmed/36534457 http://dx.doi.org/10.2196/23422 Text en ©Chen Lin, Safoora Yousefi, Elvis Kahoro, Payam Karisani, Donghai Liang, Jeremy Sarnat, Eugene Agichtein. Originally published in JMIR Formative Research (https://formative.jmir.org), 19.12.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Lin, Chen
Yousefi, Safoora
Kahoro, Elvis
Karisani, Payam
Liang, Donghai
Sarnat, Jeremy
Agichtein, Eugene
Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation
title Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation
title_full Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation
title_fullStr Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation
title_full_unstemmed Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation
title_short Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation
title_sort detecting elevated air pollution levels by monitoring web search queries: algorithm development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808603/
https://www.ncbi.nlm.nih.gov/pubmed/36534457
http://dx.doi.org/10.2196/23422
work_keys_str_mv AT linchen detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation
AT yousefisafoora detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation
AT kahoroelvis detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation
AT karisanipayam detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation
AT liangdonghai detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation
AT sarnatjeremy detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation
AT agichteineugene detectingelevatedairpollutionlevelsbymonitoringwebsearchqueriesalgorithmdevelopmentandvalidation