Cargando…

Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data

[Image: see text] Gaps in the measurement series of atmospheric pollutants can impede the reliable assessment of their impacts and trends. We propose a new method for missing data imputation of the air pollutant tropospheric ozone by using the graph machine learning algorithm “correct and smooth”. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Betancourt, Clara, Li, Cathy W. Y., Kleinert, Felix, Schultz, Martin G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10666531/
https://www.ncbi.nlm.nih.gov/pubmed/37661931
http://dx.doi.org/10.1021/acs.est.3c05104
_version_ 1785148971766775808
author Betancourt, Clara
Li, Cathy W. Y.
Kleinert, Felix
Schultz, Martin G.
author_facet Betancourt, Clara
Li, Cathy W. Y.
Kleinert, Felix
Schultz, Martin G.
author_sort Betancourt, Clara
collection PubMed
description [Image: see text] Gaps in the measurement series of atmospheric pollutants can impede the reliable assessment of their impacts and trends. We propose a new method for missing data imputation of the air pollutant tropospheric ozone by using the graph machine learning algorithm “correct and smooth”. This algorithm uses auxiliary data that characterize the measurement location and, in addition, ozone observations at neighboring sites to improve the imputations of simple statistical and machine learning models. We apply our method to data from 278 stations of the year 2011 of the German Environment Agency (Umweltbundesamt – UBA) monitoring network. The preliminary version of these data exhibits three gap patterns: shorter gaps in the range of hours, longer gaps of up to several months in length, and gaps occurring at multiple stations at once. For short gaps of up to 5 h, linear interpolation is most accurate. Longer gaps at single stations are most effectively imputed by a random forest in connection with the correct and smooth. For longer gaps at multiple stations, the correct and smooth algorithm improved the random forest despite a lack of data in the neighborhood of the missing values. We therefore suggest a hybrid of linear interpolation and graph machine learning for the imputation of tropospheric ozone time series.
format Online
Article
Text
id pubmed-10666531
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106665312023-11-23 Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data Betancourt, Clara Li, Cathy W. Y. Kleinert, Felix Schultz, Martin G. Environ Sci Technol [Image: see text] Gaps in the measurement series of atmospheric pollutants can impede the reliable assessment of their impacts and trends. We propose a new method for missing data imputation of the air pollutant tropospheric ozone by using the graph machine learning algorithm “correct and smooth”. This algorithm uses auxiliary data that characterize the measurement location and, in addition, ozone observations at neighboring sites to improve the imputations of simple statistical and machine learning models. We apply our method to data from 278 stations of the year 2011 of the German Environment Agency (Umweltbundesamt – UBA) monitoring network. The preliminary version of these data exhibits three gap patterns: shorter gaps in the range of hours, longer gaps of up to several months in length, and gaps occurring at multiple stations at once. For short gaps of up to 5 h, linear interpolation is most accurate. Longer gaps at single stations are most effectively imputed by a random forest in connection with the correct and smooth. For longer gaps at multiple stations, the correct and smooth algorithm improved the random forest despite a lack of data in the neighborhood of the missing values. We therefore suggest a hybrid of linear interpolation and graph machine learning for the imputation of tropospheric ozone time series. American Chemical Society 2023-09-04 /pmc/articles/PMC10666531/ /pubmed/37661931 http://dx.doi.org/10.1021/acs.est.3c05104 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Betancourt, Clara
Li, Cathy W. Y.
Kleinert, Felix
Schultz, Martin G.
Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
title Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
title_full Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
title_fullStr Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
title_full_unstemmed Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
title_short Graph Machine Learning for Improved Imputation of Missing Tropospheric Ozone Data
title_sort graph machine learning for improved imputation of missing tropospheric ozone data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10666531/
https://www.ncbi.nlm.nih.gov/pubmed/37661931
http://dx.doi.org/10.1021/acs.est.3c05104
work_keys_str_mv AT betancourtclara graphmachinelearningforimprovedimputationofmissingtroposphericozonedata
AT licathywy graphmachinelearningforimprovedimputationofmissingtroposphericozonedata
AT kleinertfelix graphmachinelearningforimprovedimputationofmissingtroposphericozonedata
AT schultzmarting graphmachinelearningforimprovedimputationofmissingtroposphericozonedata