Cargando…

An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data

Wastewater-based epidemiology (WBE) has gained increasing attention as a complementary tool to conventional surveillance methods with potential for significant resource and labour savings when used for public health monitoring. Using WBE datasets to train machine learning algorithms and develop pred...

Descripción completa

Detalles Bibliográficos
Autores principales: Vaughan, Liam, Zhang, Muyang, Gu, Haoran, Rose, Joan B., Naughton, Colleen C., Medema, Gertjan, Allan, Vajra, Roiko, Anne, Blackall, Linda, Zamyadi, Arash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9597519/
https://www.ncbi.nlm.nih.gov/pubmed/36306840
http://dx.doi.org/10.1016/j.scitotenv.2022.159748
_version_ 1784816110449721344
author Vaughan, Liam
Zhang, Muyang
Gu, Haoran
Rose, Joan B.
Naughton, Colleen C.
Medema, Gertjan
Allan, Vajra
Roiko, Anne
Blackall, Linda
Zamyadi, Arash
author_facet Vaughan, Liam
Zhang, Muyang
Gu, Haoran
Rose, Joan B.
Naughton, Colleen C.
Medema, Gertjan
Allan, Vajra
Roiko, Anne
Blackall, Linda
Zamyadi, Arash
author_sort Vaughan, Liam
collection PubMed
description Wastewater-based epidemiology (WBE) has gained increasing attention as a complementary tool to conventional surveillance methods with potential for significant resource and labour savings when used for public health monitoring. Using WBE datasets to train machine learning algorithms and develop predictive models may also facilitate early warnings for the spread of outbreaks. The challenges associated with using machine learning for the analysis of WBE datasets and timeseries forecasting of COVID-19 were explored by running Random Forest (RF) algorithms on WBE datasets across 108 sites in five regions: Scotland, Catalonia, Ohio, the Netherlands, and Switzerland. This method uses measurements of SARS-CoV-2 RNA fragment concentration in samples taken at the inlets of wastewater treatment plants, providing insight into the prevalence of infection in upstream wastewater catchment populations. RF's forecasting performance at each site was quantitatively evaluated by determining mean absolute percentage error (MAPE) values, which was used to highlight challenges affecting future implementations of RF for WBE forecasting efforts. Performance was generally poor using WBE datasets from Catalonia, Scotland, and Ohio with ‘reasonable’ or better forecasts constituting 0 %, 5 %, and 0 % of these regions' forecasts, respectively. RF's performance was much stronger with WBE data from the Netherlands and Switzerland, which provided 55 % and 45 % ‘reasonable’ or better forecasts respectively. Sampling frequency and training set size were identified as key factors contributing to accuracy, while inclusion of too many unnecessary variables (or e.g., flow data) was identified as a contributing factor to poor performance. The contribution of catchment population on forecast accuracy was more ambiguous. This study determined that the factors governing RF's forecast performance are complicated and interrelated, which presents challenges for further work in this space. A sufficiently accurate further iteration of the tool discussed within this study would provide significant but varying value for public health departments for monitoring future, or ongoing outbreaks, assisting the implementation of on-time health response measures.
format Online
Article
Text
id pubmed-9597519
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-95975192022-10-26 An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data Vaughan, Liam Zhang, Muyang Gu, Haoran Rose, Joan B. Naughton, Colleen C. Medema, Gertjan Allan, Vajra Roiko, Anne Blackall, Linda Zamyadi, Arash Sci Total Environ Article Wastewater-based epidemiology (WBE) has gained increasing attention as a complementary tool to conventional surveillance methods with potential for significant resource and labour savings when used for public health monitoring. Using WBE datasets to train machine learning algorithms and develop predictive models may also facilitate early warnings for the spread of outbreaks. The challenges associated with using machine learning for the analysis of WBE datasets and timeseries forecasting of COVID-19 were explored by running Random Forest (RF) algorithms on WBE datasets across 108 sites in five regions: Scotland, Catalonia, Ohio, the Netherlands, and Switzerland. This method uses measurements of SARS-CoV-2 RNA fragment concentration in samples taken at the inlets of wastewater treatment plants, providing insight into the prevalence of infection in upstream wastewater catchment populations. RF's forecasting performance at each site was quantitatively evaluated by determining mean absolute percentage error (MAPE) values, which was used to highlight challenges affecting future implementations of RF for WBE forecasting efforts. Performance was generally poor using WBE datasets from Catalonia, Scotland, and Ohio with ‘reasonable’ or better forecasts constituting 0 %, 5 %, and 0 % of these regions' forecasts, respectively. RF's performance was much stronger with WBE data from the Netherlands and Switzerland, which provided 55 % and 45 % ‘reasonable’ or better forecasts respectively. Sampling frequency and training set size were identified as key factors contributing to accuracy, while inclusion of too many unnecessary variables (or e.g., flow data) was identified as a contributing factor to poor performance. The contribution of catchment population on forecast accuracy was more ambiguous. This study determined that the factors governing RF's forecast performance are complicated and interrelated, which presents challenges for further work in this space. A sufficiently accurate further iteration of the tool discussed within this study would provide significant but varying value for public health departments for monitoring future, or ongoing outbreaks, assisting the implementation of on-time health response measures. Elsevier B.V. 2023-02-01 2022-10-25 /pmc/articles/PMC9597519/ /pubmed/36306840 http://dx.doi.org/10.1016/j.scitotenv.2022.159748 Text en © 2022 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Vaughan, Liam
Zhang, Muyang
Gu, Haoran
Rose, Joan B.
Naughton, Colleen C.
Medema, Gertjan
Allan, Vajra
Roiko, Anne
Blackall, Linda
Zamyadi, Arash
An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data
title An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data
title_full An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data
title_fullStr An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data
title_full_unstemmed An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data
title_short An exploration of challenges associated with machine learning for time series forecasting of COVID-19 community spread using wastewater-based epidemiological data
title_sort exploration of challenges associated with machine learning for time series forecasting of covid-19 community spread using wastewater-based epidemiological data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9597519/
https://www.ncbi.nlm.nih.gov/pubmed/36306840
http://dx.doi.org/10.1016/j.scitotenv.2022.159748
work_keys_str_mv AT vaughanliam anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT zhangmuyang anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT guhaoran anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT rosejoanb anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT naughtoncolleenc anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT medemagertjan anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT allanvajra anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT roikoanne anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT blackalllinda anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT zamyadiarash anexplorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT vaughanliam explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT zhangmuyang explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT guhaoran explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT rosejoanb explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT naughtoncolleenc explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT medemagertjan explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT allanvajra explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT roikoanne explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT blackalllinda explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata
AT zamyadiarash explorationofchallengesassociatedwithmachinelearningfortimeseriesforecastingofcovid19communityspreadusingwastewaterbasedepidemiologicaldata