Cargando…

Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data

Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incide...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangarajan, Prashant, Mody, Sandeep K., Marathe, Madhav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894887/
https://www.ncbi.nlm.nih.gov/pubmed/31751346
http://dx.doi.org/10.1371/journal.pcbi.1007518
_version_ 1783476480421920768
author Rangarajan, Prashant
Mody, Sandeep K.
Marathe, Madhav
author_facet Rangarajan, Prashant
Mody, Sandeep K.
Marathe, Madhav
author_sort Rangarajan, Prashant
collection PubMed
description Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time.
format Online
Article
Text
id pubmed-6894887
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68948872019-12-13 Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data Rangarajan, Prashant Mody, Sandeep K. Marathe, Madhav PLoS Comput Biol Research Article Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world’s population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time. Public Library of Science 2019-11-21 /pmc/articles/PMC6894887/ /pubmed/31751346 http://dx.doi.org/10.1371/journal.pcbi.1007518 Text en © 2019 Rangarajan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rangarajan, Prashant
Mody, Sandeep K.
Marathe, Madhav
Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data
title Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data
title_full Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data
title_fullStr Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data
title_full_unstemmed Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data
title_short Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data
title_sort forecasting dengue and influenza incidences using a sparse representation of google trends, electronic health records, and time series data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894887/
https://www.ncbi.nlm.nih.gov/pubmed/31751346
http://dx.doi.org/10.1371/journal.pcbi.1007518
work_keys_str_mv AT rangarajanprashant forecastingdengueandinfluenzaincidencesusingasparserepresentationofgoogletrendselectronichealthrecordsandtimeseriesdata
AT modysandeepk forecastingdengueandinfluenzaincidencesusingasparserepresentationofgoogletrendselectronichealthrecordsandtimeseriesdata
AT marathemadhav forecastingdengueandinfluenzaincidencesusingasparserepresentationofgoogletrendselectronichealthrecordsandtimeseriesdata