Cargando…

Prediction of Dengue Incidence Using Search Query Surveillance

BACKGROUND: The use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission. METHODS: We gather...

Descripción completa

Detalles Bibliográficos
Autores principales: Althouse, Benjamin M., Ng, Yih Yng, Cummings, Derek A. T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3149016/
https://www.ncbi.nlm.nih.gov/pubmed/21829744
http://dx.doi.org/10.1371/journal.pntd.0001258
_version_ 1782209406078812160
author Althouse, Benjamin M.
Ng, Yih Yng
Cummings, Derek A. T.
author_facet Althouse, Benjamin M.
Ng, Yih Yng
Cummings, Derek A. T.
author_sort Althouse, Benjamin M.
collection PubMed
description BACKGROUND: The use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission. METHODS: We gathered freely-available dengue incidence data from Singapore (weekly incidence, 2004–2011) and Bangkok (monthly incidence, 2004–2011). Internet search data for the same period were downloaded from Google Insights for Search. Search terms were chosen to reflect three categories of dengue-related search: nomenclature, signs/symptoms, and treatment. We compared three models to predict incidence: a step-down linear regression, generalized boosted regression, and negative binomial regression. Logistic regression and Support Vector Machine (SVM) models were used to predict a binary outcome defined by whether dengue incidence exceeded a chosen threshold. Incidence prediction models were assessed using [Image: see text] and Pearson correlation between predicted and observed dengue incidence. Logistic and SVM model performance were assessed by the area under the receiver operating characteristic curve. Models were validated using multiple cross-validation techniques. RESULTS: The linear model selected by AIC step-down was found to be superior to other models considered. In Bangkok, the model has an [Image: see text], and a correlation of 0.869 between fitted and observed. In Singapore, the model has an [Image: see text], and a correlation of 0.931. In both Singapore and Bangkok, SVM models outperformed logistic regression in predicting periods of high incidence. The AUC for the SVM models using the 75th percentile cutoff is 0.906 in Singapore and 0.960 in Bangkok. CONCLUSIONS: Internet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems. The methods presented here use freely available data and analysis tools and can be readily adapted to other settings.
format Online
Article
Text
id pubmed-3149016
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31490162011-08-09 Prediction of Dengue Incidence Using Search Query Surveillance Althouse, Benjamin M. Ng, Yih Yng Cummings, Derek A. T. PLoS Negl Trop Dis Research Article BACKGROUND: The use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission. METHODS: We gathered freely-available dengue incidence data from Singapore (weekly incidence, 2004–2011) and Bangkok (monthly incidence, 2004–2011). Internet search data for the same period were downloaded from Google Insights for Search. Search terms were chosen to reflect three categories of dengue-related search: nomenclature, signs/symptoms, and treatment. We compared three models to predict incidence: a step-down linear regression, generalized boosted regression, and negative binomial regression. Logistic regression and Support Vector Machine (SVM) models were used to predict a binary outcome defined by whether dengue incidence exceeded a chosen threshold. Incidence prediction models were assessed using [Image: see text] and Pearson correlation between predicted and observed dengue incidence. Logistic and SVM model performance were assessed by the area under the receiver operating characteristic curve. Models were validated using multiple cross-validation techniques. RESULTS: The linear model selected by AIC step-down was found to be superior to other models considered. In Bangkok, the model has an [Image: see text], and a correlation of 0.869 between fitted and observed. In Singapore, the model has an [Image: see text], and a correlation of 0.931. In both Singapore and Bangkok, SVM models outperformed logistic regression in predicting periods of high incidence. The AUC for the SVM models using the 75th percentile cutoff is 0.906 in Singapore and 0.960 in Bangkok. CONCLUSIONS: Internet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems. The methods presented here use freely available data and analysis tools and can be readily adapted to other settings. Public Library of Science 2011-08-02 /pmc/articles/PMC3149016/ /pubmed/21829744 http://dx.doi.org/10.1371/journal.pntd.0001258 Text en Althouse et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Althouse, Benjamin M.
Ng, Yih Yng
Cummings, Derek A. T.
Prediction of Dengue Incidence Using Search Query Surveillance
title Prediction of Dengue Incidence Using Search Query Surveillance
title_full Prediction of Dengue Incidence Using Search Query Surveillance
title_fullStr Prediction of Dengue Incidence Using Search Query Surveillance
title_full_unstemmed Prediction of Dengue Incidence Using Search Query Surveillance
title_short Prediction of Dengue Incidence Using Search Query Surveillance
title_sort prediction of dengue incidence using search query surveillance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3149016/
https://www.ncbi.nlm.nih.gov/pubmed/21829744
http://dx.doi.org/10.1371/journal.pntd.0001258
work_keys_str_mv AT althousebenjaminm predictionofdengueincidenceusingsearchquerysurveillance
AT ngyihyng predictionofdengueincidenceusingsearchquerysurveillance
AT cummingsderekat predictionofdengueincidenceusingsearchquerysurveillance