Cargando…

Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation

High-frequency monitoring of water quality in catchments brings along the challenge of post-processing large amounts of data. Moreover, monitoring stations are often remote and technical issues resulting in data gaps are common. Machine learning algorithms can be applied to fill these gaps, and to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Barcala, Victoria, Rozemeijer, Joachim, Ouwerkerk, Kevin, Gerner, Laurens, Osté, Leonard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10299926/
https://www.ncbi.nlm.nih.gov/pubmed/37368078
http://dx.doi.org/10.1007/s10661-023-11519-9
_version_ 1785064474773815296
author Barcala, Victoria
Rozemeijer, Joachim
Ouwerkerk, Kevin
Gerner, Laurens
Osté, Leonard
author_facet Barcala, Victoria
Rozemeijer, Joachim
Ouwerkerk, Kevin
Gerner, Laurens
Osté, Leonard
author_sort Barcala, Victoria
collection PubMed
description High-frequency monitoring of water quality in catchments brings along the challenge of post-processing large amounts of data. Moreover, monitoring stations are often remote and technical issues resulting in data gaps are common. Machine learning algorithms can be applied to fill these gaps, and to a certain extent, for predictions and interpretation. The objectives of this study were (1) to evaluate six different machine learning models for gap-filling in a high-frequency nitrate and total phosphorus concentration time series, (2) to showcase the potential added value (and limitations) of machine learning to interpret underlying processes, and (3) to study the limits of machine learning algorithms for predictions outside the training period. We used a 4-year high-frequency dataset from a ditch draining one intensive dairy farm in the east of The Netherlands. Continuous time series of precipitation, evapotranspiration, groundwater levels, discharge, turbidity, and nitrate or total phosphorus were used as predictors for total phosphorus and nitrate concentrations respectively. Our results showed that the random forest algorithm had the best performance to fill in data-gaps, with R(2) higher than 0.92 and short computation times. The feature importance helped understanding the changes in transport processes linked to water conservation measures and rain variability. Applying the machine learning model outside the training period resulted in a low performance, largely due to system changes (manure surplus and water conservation) which were not included as predictors. This study offers a valuable and novel example of how to use and interpret machine learning models for post-processing high-frequency water quality data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10661-023-11519-9.
format Online
Article
Text
id pubmed-10299926
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-102999262023-06-29 Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation Barcala, Victoria Rozemeijer, Joachim Ouwerkerk, Kevin Gerner, Laurens Osté, Leonard Environ Monit Assess Research High-frequency monitoring of water quality in catchments brings along the challenge of post-processing large amounts of data. Moreover, monitoring stations are often remote and technical issues resulting in data gaps are common. Machine learning algorithms can be applied to fill these gaps, and to a certain extent, for predictions and interpretation. The objectives of this study were (1) to evaluate six different machine learning models for gap-filling in a high-frequency nitrate and total phosphorus concentration time series, (2) to showcase the potential added value (and limitations) of machine learning to interpret underlying processes, and (3) to study the limits of machine learning algorithms for predictions outside the training period. We used a 4-year high-frequency dataset from a ditch draining one intensive dairy farm in the east of The Netherlands. Continuous time series of precipitation, evapotranspiration, groundwater levels, discharge, turbidity, and nitrate or total phosphorus were used as predictors for total phosphorus and nitrate concentrations respectively. Our results showed that the random forest algorithm had the best performance to fill in data-gaps, with R(2) higher than 0.92 and short computation times. The feature importance helped understanding the changes in transport processes linked to water conservation measures and rain variability. Applying the machine learning model outside the training period resulted in a low performance, largely due to system changes (manure surplus and water conservation) which were not included as predictors. This study offers a valuable and novel example of how to use and interpret machine learning models for post-processing high-frequency water quality data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10661-023-11519-9. Springer International Publishing 2023-06-27 2023 /pmc/articles/PMC10299926/ /pubmed/37368078 http://dx.doi.org/10.1007/s10661-023-11519-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research
Barcala, Victoria
Rozemeijer, Joachim
Ouwerkerk, Kevin
Gerner, Laurens
Osté, Leonard
Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
title Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
title_full Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
title_fullStr Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
title_full_unstemmed Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
title_short Value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
title_sort value and limitations of machine learning in high-frequency nutrient data for gap-filling, forecasting, and transport process interpretation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10299926/
https://www.ncbi.nlm.nih.gov/pubmed/37368078
http://dx.doi.org/10.1007/s10661-023-11519-9
work_keys_str_mv AT barcalavictoria valueandlimitationsofmachinelearninginhighfrequencynutrientdataforgapfillingforecastingandtransportprocessinterpretation
AT rozemeijerjoachim valueandlimitationsofmachinelearninginhighfrequencynutrientdataforgapfillingforecastingandtransportprocessinterpretation
AT ouwerkerkkevin valueandlimitationsofmachinelearninginhighfrequencynutrientdataforgapfillingforecastingandtransportprocessinterpretation
AT gernerlaurens valueandlimitationsofmachinelearninginhighfrequencynutrientdataforgapfillingforecastingandtransportprocessinterpretation
AT osteleonard valueandlimitationsofmachinelearninginhighfrequencynutrientdataforgapfillingforecastingandtransportprocessinterpretation