Cargando…

Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies

Accurate and sufficient water quality data is essential for watershed management and sustainability. Machine learning models have shown great potentials for estimating water quality with the development of online sensors. However, accurate estimation is challenging because of uncertainties related t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Shengyue, Zhang, Zhenyu, Lin, Juanjuan, Huang, Jinliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278742/
https://www.ncbi.nlm.nih.gov/pubmed/35830456
http://dx.doi.org/10.1371/journal.pone.0271458
_version_ 1784746249913630720
author Chen, Shengyue
Zhang, Zhenyu
Lin, Juanjuan
Huang, Jinliang
author_facet Chen, Shengyue
Zhang, Zhenyu
Lin, Juanjuan
Huang, Jinliang
author_sort Chen, Shengyue
collection PubMed
description Accurate and sufficient water quality data is essential for watershed management and sustainability. Machine learning models have shown great potentials for estimating water quality with the development of online sensors. However, accurate estimation is challenging because of uncertainties related to models used and data input. In this study, random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN) models are developed with three sampling frequency datasets (i.e., 4-hourly, daily, and weekly) and five conventional indicators (i.e., water temperature (WT), hydrogen ion concentration (pH), electrical conductivity (EC), dissolved oxygen (DO), and turbidity (TUR)) as surrogates to individually estimate riverine total phosphorus (TP), total nitrogen (TN), and ammonia nitrogen (NH(4)(+)-N) in a small-scale coastal watershed. The results show that the RF model outperforms the SVM and BPNN machine learning models in terms of estimative performance, which explains much of the variation in TP (79 ± 1.3%), TN (84 ± 0.9%), and NH(4)(+)-N (75 ± 1.3%), when using the 4-hourly sampling frequency dataset. The higher sampling frequency would help the RF obtain a significantly better performance for the three nutrient estimation measures (4-hourly > daily > weekly) for R(2) and NSE values. WT, EC, and TUR were the three key input indicators for nutrient estimations in RF. Our study highlights the importance of high-frequency data as input to machine learning model development. The RF model is shown to be viable for riverine nutrient estimation in small-scale watersheds of important local water security.
format Online
Article
Text
id pubmed-9278742
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92787422022-07-14 Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies Chen, Shengyue Zhang, Zhenyu Lin, Juanjuan Huang, Jinliang PLoS One Research Article Accurate and sufficient water quality data is essential for watershed management and sustainability. Machine learning models have shown great potentials for estimating water quality with the development of online sensors. However, accurate estimation is challenging because of uncertainties related to models used and data input. In this study, random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN) models are developed with three sampling frequency datasets (i.e., 4-hourly, daily, and weekly) and five conventional indicators (i.e., water temperature (WT), hydrogen ion concentration (pH), electrical conductivity (EC), dissolved oxygen (DO), and turbidity (TUR)) as surrogates to individually estimate riverine total phosphorus (TP), total nitrogen (TN), and ammonia nitrogen (NH(4)(+)-N) in a small-scale coastal watershed. The results show that the RF model outperforms the SVM and BPNN machine learning models in terms of estimative performance, which explains much of the variation in TP (79 ± 1.3%), TN (84 ± 0.9%), and NH(4)(+)-N (75 ± 1.3%), when using the 4-hourly sampling frequency dataset. The higher sampling frequency would help the RF obtain a significantly better performance for the three nutrient estimation measures (4-hourly > daily > weekly) for R(2) and NSE values. WT, EC, and TUR were the three key input indicators for nutrient estimations in RF. Our study highlights the importance of high-frequency data as input to machine learning model development. The RF model is shown to be viable for riverine nutrient estimation in small-scale watersheds of important local water security. Public Library of Science 2022-07-13 /pmc/articles/PMC9278742/ /pubmed/35830456 http://dx.doi.org/10.1371/journal.pone.0271458 Text en © 2022 Chen et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chen, Shengyue
Zhang, Zhenyu
Lin, Juanjuan
Huang, Jinliang
Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
title Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
title_full Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
title_fullStr Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
title_full_unstemmed Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
title_short Machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
title_sort machine learning-based estimation of riverine nutrient concentrations and associated uncertainties caused by sampling frequencies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278742/
https://www.ncbi.nlm.nih.gov/pubmed/35830456
http://dx.doi.org/10.1371/journal.pone.0271458
work_keys_str_mv AT chenshengyue machinelearningbasedestimationofriverinenutrientconcentrationsandassociateduncertaintiescausedbysamplingfrequencies
AT zhangzhenyu machinelearningbasedestimationofriverinenutrientconcentrationsandassociateduncertaintiescausedbysamplingfrequencies
AT linjuanjuan machinelearningbasedestimationofriverinenutrientconcentrationsandassociateduncertaintiescausedbysamplingfrequencies
AT huangjinliang machinelearningbasedestimationofriverinenutrientconcentrationsandassociateduncertaintiescausedbysamplingfrequencies