Cargando…

An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology

Wastewater-based epidemiology has emerged as a viable tool for monitoring disease prevalence in a population. This paper details a time series machine learning (TSML) method for predicting COVID-19 cases from wastewater and environmental variables. The TSML method utilizes a number of techniques to...

Descripción completa

Detalles Bibliográficos
Autores principales: Lai, Mallory, Wulff, Shaun S., Cao, Yongtao, Robinson, Timothy J., Rajapaksha, Rasika
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10562867/
https://www.ncbi.nlm.nih.gov/pubmed/37822674
http://dx.doi.org/10.1016/j.mex.2023.102382
_version_ 1785118225394040832
author Lai, Mallory
Wulff, Shaun S.
Cao, Yongtao
Robinson, Timothy J.
Rajapaksha, Rasika
author_facet Lai, Mallory
Wulff, Shaun S.
Cao, Yongtao
Robinson, Timothy J.
Rajapaksha, Rasika
author_sort Lai, Mallory
collection PubMed
description Wastewater-based epidemiology has emerged as a viable tool for monitoring disease prevalence in a population. This paper details a time series machine learning (TSML) method for predicting COVID-19 cases from wastewater and environmental variables. The TSML method utilizes a number of techniques to create an interpretable, hypothesis-driven framework for machine learning that can handle different nowcast and forecast lengths. Some of the techniques employed include: • Feature engineering to construct interpretable features, like site-specific lead times, hypothesized to be potential predictors of COVID-19 cases. • Feature selection to identify features with the best predictive performance for the tasks of nowcasting and forecasting. • Prequential evaluation to prevent data leakage while evaluating the performance of the machine learning algorithm.
format Online
Article
Text
id pubmed-10562867
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-105628672023-10-11 An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology Lai, Mallory Wulff, Shaun S. Cao, Yongtao Robinson, Timothy J. Rajapaksha, Rasika MethodsX Bioinformatics Wastewater-based epidemiology has emerged as a viable tool for monitoring disease prevalence in a population. This paper details a time series machine learning (TSML) method for predicting COVID-19 cases from wastewater and environmental variables. The TSML method utilizes a number of techniques to create an interpretable, hypothesis-driven framework for machine learning that can handle different nowcast and forecast lengths. Some of the techniques employed include: • Feature engineering to construct interpretable features, like site-specific lead times, hypothesized to be potential predictors of COVID-19 cases. • Feature selection to identify features with the best predictive performance for the tasks of nowcasting and forecasting. • Prequential evaluation to prevent data leakage while evaluating the performance of the machine learning algorithm. Elsevier 2023-09-27 /pmc/articles/PMC10562867/ /pubmed/37822674 http://dx.doi.org/10.1016/j.mex.2023.102382 Text en © 2023 Published by Elsevier B.V. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Bioinformatics
Lai, Mallory
Wulff, Shaun S.
Cao, Yongtao
Robinson, Timothy J.
Rajapaksha, Rasika
An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
title An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
title_full An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
title_fullStr An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
title_full_unstemmed An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
title_short An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
title_sort interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10562867/
https://www.ncbi.nlm.nih.gov/pubmed/37822674
http://dx.doi.org/10.1016/j.mex.2023.102382
work_keys_str_mv AT laimallory aninterpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT wulffshauns aninterpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT caoyongtao aninterpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT robinsontimothyj aninterpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT rajapaksharasika aninterpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT laimallory interpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT wulffshauns interpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT caoyongtao interpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT robinsontimothyj interpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology
AT rajapaksharasika interpretabletimeseriesmachinelearningmethodforvaryingforecastandnowcastlengthsinwastewaterbasedepidemiology