Cargando…

Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis

BACKGROUND: Traditional influenza surveillance relies on influenza-like illness (ILI) syndrome that is reported by health care providers. It primarily captures individuals who seek medical care and misses those who do not. Recently, Web-based data sources have been studied for application to public...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharpe, J Danielle, Hopkins, Richard S, Cook, Robert L, Striley, Catherine W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5095368/
https://www.ncbi.nlm.nih.gov/pubmed/27765731
http://dx.doi.org/10.2196/publichealth.5901
_version_ 1782465283868327936
author Sharpe, J Danielle
Hopkins, Richard S
Cook, Robert L
Striley, Catherine W
author_facet Sharpe, J Danielle
Hopkins, Richard S
Cook, Robert L
Striley, Catherine W
author_sort Sharpe, J Danielle
collection PubMed
description BACKGROUND: Traditional influenza surveillance relies on influenza-like illness (ILI) syndrome that is reported by health care providers. It primarily captures individuals who seek medical care and misses those who do not. Recently, Web-based data sources have been studied for application to public health surveillance, as there is a growing number of people who search, post, and tweet about their illnesses before seeking medical care. Existing research has shown some promise of using data from Google, Twitter, and Wikipedia to complement traditional surveillance for ILI. However, past studies have evaluated these Web-based sources individually or dually without comparing all 3 of them, and it would be beneficial to know which of the Web-based sources performs best in order to be considered to complement traditional methods. OBJECTIVE: The objective of this study is to comparatively analyze Google, Twitter, and Wikipedia by examining which best corresponds with Centers for Disease Control and Prevention (CDC) ILI data. It was hypothesized that Wikipedia will best correspond with CDC ILI data as previous research found it to be least influenced by high media coverage in comparison with Google and Twitter. METHODS: Publicly available, deidentified data were collected from the CDC, Google Flu Trends, HealthTweets, and Wikipedia for the 2012-2015 influenza seasons. Bayesian change point analysis was used to detect seasonal changes, or change points, in each of the data sources. Change points in Google, Twitter, and Wikipedia that occurred during the exact week, 1 preceding week, or 1 week after the CDC’s change points were compared with the CDC data as the gold standard. All analyses were conducted using the R package “bcp” version 4.0.0 in RStudio version 0.99.484 (RStudio Inc). In addition, sensitivity and positive predictive values (PPV) were calculated for Google, Twitter, and Wikipedia. RESULTS: During the 2012-2015 influenza seasons, a high sensitivity of 92% was found for Google, whereas the PPV for Google was 85%. A low sensitivity of 50% was calculated for Twitter; a low PPV of 43% was found for Twitter also. Wikipedia had the lowest sensitivity of 33% and lowest PPV of 40%. CONCLUSIONS: Of the 3 Web-based sources, Google had the best combination of sensitivity and PPV in detecting Bayesian change points in influenza-related data streams. Findings demonstrated that change points in Google, Twitter, and Wikipedia data occasionally aligned well with change points captured in CDC ILI data, yet these sources did not detect all changes in CDC data and should be further studied and developed.
format Online
Article
Text
id pubmed-5095368
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-50953682016-11-17 Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis Sharpe, J Danielle Hopkins, Richard S Cook, Robert L Striley, Catherine W JMIR Public Health Surveill Original Paper BACKGROUND: Traditional influenza surveillance relies on influenza-like illness (ILI) syndrome that is reported by health care providers. It primarily captures individuals who seek medical care and misses those who do not. Recently, Web-based data sources have been studied for application to public health surveillance, as there is a growing number of people who search, post, and tweet about their illnesses before seeking medical care. Existing research has shown some promise of using data from Google, Twitter, and Wikipedia to complement traditional surveillance for ILI. However, past studies have evaluated these Web-based sources individually or dually without comparing all 3 of them, and it would be beneficial to know which of the Web-based sources performs best in order to be considered to complement traditional methods. OBJECTIVE: The objective of this study is to comparatively analyze Google, Twitter, and Wikipedia by examining which best corresponds with Centers for Disease Control and Prevention (CDC) ILI data. It was hypothesized that Wikipedia will best correspond with CDC ILI data as previous research found it to be least influenced by high media coverage in comparison with Google and Twitter. METHODS: Publicly available, deidentified data were collected from the CDC, Google Flu Trends, HealthTweets, and Wikipedia for the 2012-2015 influenza seasons. Bayesian change point analysis was used to detect seasonal changes, or change points, in each of the data sources. Change points in Google, Twitter, and Wikipedia that occurred during the exact week, 1 preceding week, or 1 week after the CDC’s change points were compared with the CDC data as the gold standard. All analyses were conducted using the R package “bcp” version 4.0.0 in RStudio version 0.99.484 (RStudio Inc). In addition, sensitivity and positive predictive values (PPV) were calculated for Google, Twitter, and Wikipedia. RESULTS: During the 2012-2015 influenza seasons, a high sensitivity of 92% was found for Google, whereas the PPV for Google was 85%. A low sensitivity of 50% was calculated for Twitter; a low PPV of 43% was found for Twitter also. Wikipedia had the lowest sensitivity of 33% and lowest PPV of 40%. CONCLUSIONS: Of the 3 Web-based sources, Google had the best combination of sensitivity and PPV in detecting Bayesian change points in influenza-related data streams. Findings demonstrated that change points in Google, Twitter, and Wikipedia data occasionally aligned well with change points captured in CDC ILI data, yet these sources did not detect all changes in CDC data and should be further studied and developed. JMIR Publications 2016-10-20 /pmc/articles/PMC5095368/ /pubmed/27765731 http://dx.doi.org/10.2196/publichealth.5901 Text en ©J Danielle Sharpe, Richard S Hopkins, Robert L Cook, Catherine W Striley. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 20.10.2016. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Sharpe, J Danielle
Hopkins, Richard S
Cook, Robert L
Striley, Catherine W
Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis
title Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis
title_full Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis
title_fullStr Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis
title_full_unstemmed Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis
title_short Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis
title_sort evaluating google, twitter, and wikipedia as tools for influenza surveillance using bayesian change point analysis: a comparative analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5095368/
https://www.ncbi.nlm.nih.gov/pubmed/27765731
http://dx.doi.org/10.2196/publichealth.5901
work_keys_str_mv AT sharpejdanielle evaluatinggoogletwitterandwikipediaastoolsforinfluenzasurveillanceusingbayesianchangepointanalysisacomparativeanalysis
AT hopkinsrichards evaluatinggoogletwitterandwikipediaastoolsforinfluenzasurveillanceusingbayesianchangepointanalysisacomparativeanalysis
AT cookrobertl evaluatinggoogletwitterandwikipediaastoolsforinfluenzasurveillanceusingbayesianchangepointanalysisacomparativeanalysis
AT strileycatherinew evaluatinggoogletwitterandwikipediaastoolsforinfluenzasurveillanceusingbayesianchangepointanalysisacomparativeanalysis