Cargando…

A Dirichlet process model for classifying and forecasting epidemic curves

BACKGROUND: A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities assigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are several biological, behavioral, and environmental factors that influence...

Descripción completa

Detalles Bibliográficos
Autores principales: Nsoesie, Elaine O, Leman, Scotland C, Marathe, Madhav V
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3901791/
https://www.ncbi.nlm.nih.gov/pubmed/24405642
http://dx.doi.org/10.1186/1471-2334-14-12
_version_ 1782300908022923264
author Nsoesie, Elaine O
Leman, Scotland C
Marathe, Madhav V
author_facet Nsoesie, Elaine O
Leman, Scotland C
Marathe, Madhav V
author_sort Nsoesie, Elaine O
collection PubMed
description BACKGROUND: A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities assigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are several biological, behavioral, and environmental factors that influence the number of cases observed at each point during an epidemic. However, accurate forecasts of epidemics would impact timely and effective implementation of public health interventions. In this study, we introduce a Dirichlet process (DP) model for classifying and forecasting influenza epidemic curves. METHODS: The DP model is a nonparametric Bayesian approach that enables the matching of current influenza activity to simulated and historical patterns, identifies epidemic curves different from those observed in the past and enables prediction of the expected epidemic peak time. The method was validated using simulated influenza epidemics from an individual-based model and the accuracy was compared to that of the tree-based classification technique, Random Forest (RF), which has been shown to achieve high accuracy in the early prediction of epidemic curves using a classification approach. We also applied the method to forecasting influenza outbreaks in the United States from 1997–2013 using influenza-like illness (ILI) data from the Centers for Disease Control and Prevention (CDC). RESULTS: We made the following observations. First, the DP model performed as well as RF in identifying several of the simulated epidemics. Second, the DP model correctly forecasted the peak time several days in advance for most of the simulated epidemics. Third, the accuracy of identifying epidemics different from those already observed improved with additional data, as expected. Fourth, both methods correctly classified epidemics with higher reproduction numbers (R) with a higher accuracy compared to epidemics with lower R values. Lastly, in the classification of seasonal influenza epidemics based on ILI data from the CDC, the methods’ performance was comparable. CONCLUSIONS: Although RF requires less computational time compared to the DP model, the algorithm is fully supervised implying that epidemic curves different from those previously observed will always be misclassified. In contrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their relative merits, an approach that uses both RF and the DP model could be beneficial.
format Online
Article
Text
id pubmed-3901791
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39017912014-02-06 A Dirichlet process model for classifying and forecasting epidemic curves Nsoesie, Elaine O Leman, Scotland C Marathe, Madhav V BMC Infect Dis Research Article BACKGROUND: A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities assigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are several biological, behavioral, and environmental factors that influence the number of cases observed at each point during an epidemic. However, accurate forecasts of epidemics would impact timely and effective implementation of public health interventions. In this study, we introduce a Dirichlet process (DP) model for classifying and forecasting influenza epidemic curves. METHODS: The DP model is a nonparametric Bayesian approach that enables the matching of current influenza activity to simulated and historical patterns, identifies epidemic curves different from those observed in the past and enables prediction of the expected epidemic peak time. The method was validated using simulated influenza epidemics from an individual-based model and the accuracy was compared to that of the tree-based classification technique, Random Forest (RF), which has been shown to achieve high accuracy in the early prediction of epidemic curves using a classification approach. We also applied the method to forecasting influenza outbreaks in the United States from 1997–2013 using influenza-like illness (ILI) data from the Centers for Disease Control and Prevention (CDC). RESULTS: We made the following observations. First, the DP model performed as well as RF in identifying several of the simulated epidemics. Second, the DP model correctly forecasted the peak time several days in advance for most of the simulated epidemics. Third, the accuracy of identifying epidemics different from those already observed improved with additional data, as expected. Fourth, both methods correctly classified epidemics with higher reproduction numbers (R) with a higher accuracy compared to epidemics with lower R values. Lastly, in the classification of seasonal influenza epidemics based on ILI data from the CDC, the methods’ performance was comparable. CONCLUSIONS: Although RF requires less computational time compared to the DP model, the algorithm is fully supervised implying that epidemic curves different from those previously observed will always be misclassified. In contrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their relative merits, an approach that uses both RF and the DP model could be beneficial. BioMed Central 2014-01-09 /pmc/articles/PMC3901791/ /pubmed/24405642 http://dx.doi.org/10.1186/1471-2334-14-12 Text en Copyright © 2014 Nsoesie et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nsoesie, Elaine O
Leman, Scotland C
Marathe, Madhav V
A Dirichlet process model for classifying and forecasting epidemic curves
title A Dirichlet process model for classifying and forecasting epidemic curves
title_full A Dirichlet process model for classifying and forecasting epidemic curves
title_fullStr A Dirichlet process model for classifying and forecasting epidemic curves
title_full_unstemmed A Dirichlet process model for classifying and forecasting epidemic curves
title_short A Dirichlet process model for classifying and forecasting epidemic curves
title_sort dirichlet process model for classifying and forecasting epidemic curves
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3901791/
https://www.ncbi.nlm.nih.gov/pubmed/24405642
http://dx.doi.org/10.1186/1471-2334-14-12
work_keys_str_mv AT nsoesieelaineo adirichletprocessmodelforclassifyingandforecastingepidemiccurves
AT lemanscotlandc adirichletprocessmodelforclassifyingandforecastingepidemiccurves
AT marathemadhavv adirichletprocessmodelforclassifyingandforecastingepidemiccurves
AT nsoesieelaineo dirichletprocessmodelforclassifyingandforecastingepidemiccurves
AT lemanscotlandc dirichletprocessmodelforclassifyingandforecastingepidemiccurves
AT marathemadhavv dirichletprocessmodelforclassifyingandforecastingepidemiccurves