Cargando…
The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities
PURPOSE OF REVIEW: This review aims to better understand the utility of machine learning algorithms for predicting spatial patterns of contaminants in the United States (U.S.) drinking water. RECENT FINDINGS: We found 27 U.S. drinking water studies in the past ten years that used machine learning al...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883334/ https://www.ncbi.nlm.nih.gov/pubmed/36527604 http://dx.doi.org/10.1007/s40572-022-00389-x |
_version_ | 1784879486693539840 |
---|---|
author | Hu, Xindi C. Dai, Mona Sun, Jennifer M. Sunderland, Elsie M. |
author_facet | Hu, Xindi C. Dai, Mona Sun, Jennifer M. Sunderland, Elsie M. |
author_sort | Hu, Xindi C. |
collection | PubMed |
description | PURPOSE OF REVIEW: This review aims to better understand the utility of machine learning algorithms for predicting spatial patterns of contaminants in the United States (U.S.) drinking water. RECENT FINDINGS: We found 27 U.S. drinking water studies in the past ten years that used machine learning algorithms to predict water quality. Most studies (42%) developed random forest classification models for groundwater. Continuous models show low predictive power, suggesting that larger datasets and additional predictors are needed. Categorical/classification models for arsenic and nitrate that predict exceedances of pollution thresholds are most common in the literature because of good national scale data coverage and priority as environmental health concerns. Most groundwater data used to develop models were obtained from the United States Geological Survey (USGS) National Water Information System (NWIS). Predictors were similar across contaminants but challenges are posed by the lack of a standard methodology for imputation, pre-processing, and differing availability of data across regions. SUMMARY: We reviewed 27 articles that focused on seven drinking water contaminants. Good performance metrics were reported for binary models that classified chemical concentrations above a threshold value by finding significant predictors. Classification models are especially useful for assisting in the design of sampling efforts by identifying high-risk areas. Only a few studies have developed continuous models and obtaining good predictive performance for such models is still challenging. Improving continuous models is important for potential future use in epidemiological studies to supplement data gaps in exposure assessments for drinking water contaminants. While significant progress has been made over the past decade, methodological advances are still needed for selecting appropriate model performance metrics and accounting for spatial autocorrelations in data. Finally, improved infrastructure for code and data sharing would spearhead more rapid advances in machine-learning models for drinking water quality. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s40572-022-00389-x. |
format | Online Article Text |
id | pubmed-9883334 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-98833342023-01-29 The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities Hu, Xindi C. Dai, Mona Sun, Jennifer M. Sunderland, Elsie M. Curr Environ Health Rep Metals and Health (TR Sanchez and M Tellez-Plaza, Section Editors) PURPOSE OF REVIEW: This review aims to better understand the utility of machine learning algorithms for predicting spatial patterns of contaminants in the United States (U.S.) drinking water. RECENT FINDINGS: We found 27 U.S. drinking water studies in the past ten years that used machine learning algorithms to predict water quality. Most studies (42%) developed random forest classification models for groundwater. Continuous models show low predictive power, suggesting that larger datasets and additional predictors are needed. Categorical/classification models for arsenic and nitrate that predict exceedances of pollution thresholds are most common in the literature because of good national scale data coverage and priority as environmental health concerns. Most groundwater data used to develop models were obtained from the United States Geological Survey (USGS) National Water Information System (NWIS). Predictors were similar across contaminants but challenges are posed by the lack of a standard methodology for imputation, pre-processing, and differing availability of data across regions. SUMMARY: We reviewed 27 articles that focused on seven drinking water contaminants. Good performance metrics were reported for binary models that classified chemical concentrations above a threshold value by finding significant predictors. Classification models are especially useful for assisting in the design of sampling efforts by identifying high-risk areas. Only a few studies have developed continuous models and obtaining good predictive performance for such models is still challenging. Improving continuous models is important for potential future use in epidemiological studies to supplement data gaps in exposure assessments for drinking water contaminants. While significant progress has been made over the past decade, methodological advances are still needed for selecting appropriate model performance metrics and accounting for spatial autocorrelations in data. Finally, improved infrastructure for code and data sharing would spearhead more rapid advances in machine-learning models for drinking water quality. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s40572-022-00389-x. Springer International Publishing 2022-12-17 2023 /pmc/articles/PMC9883334/ /pubmed/36527604 http://dx.doi.org/10.1007/s40572-022-00389-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Metals and Health (TR Sanchez and M Tellez-Plaza, Section Editors) Hu, Xindi C. Dai, Mona Sun, Jennifer M. Sunderland, Elsie M. The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities |
title | The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities |
title_full | The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities |
title_fullStr | The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities |
title_full_unstemmed | The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities |
title_short | The Utility of Machine Learning Models for Predicting Chemical Contaminants in Drinking Water: Promise, Challenges, and Opportunities |
title_sort | utility of machine learning models for predicting chemical contaminants in drinking water: promise, challenges, and opportunities |
topic | Metals and Health (TR Sanchez and M Tellez-Plaza, Section Editors) |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883334/ https://www.ncbi.nlm.nih.gov/pubmed/36527604 http://dx.doi.org/10.1007/s40572-022-00389-x |
work_keys_str_mv | AT huxindic theutilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT daimona theutilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT sunjenniferm theutilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT sunderlandelsiem theutilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT huxindic utilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT daimona utilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT sunjenniferm utilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities AT sunderlandelsiem utilityofmachinelearningmodelsforpredictingchemicalcontaminantsindrinkingwaterpromisechallengesandopportunities |