Cargando…

The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance

BACKGROUND: Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a sour...

Descripción completa

Detalles Bibliográficos
Autores principales: Aslam, Anoshé A, Tsou, Ming-Hsiang, Spitzberg, Brian H, An, Li, Gawron, J Mark, Gupta, Dipak K, Peddecord, K Michael, Nagel, Anna C, Allen, Christopher, Yang, Jiue-An, Lindsay, Suzanne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications Inc. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4260066/
https://www.ncbi.nlm.nih.gov/pubmed/25406040
http://dx.doi.org/10.2196/jmir.3532
_version_ 1782348114655444992
author Aslam, Anoshé A
Tsou, Ming-Hsiang
Spitzberg, Brian H
An, Li
Gawron, J Mark
Gupta, Dipak K
Peddecord, K Michael
Nagel, Anna C
Allen, Christopher
Yang, Jiue-An
Lindsay, Suzanne
author_facet Aslam, Anoshé A
Tsou, Ming-Hsiang
Spitzberg, Brian H
An, Li
Gawron, J Mark
Gupta, Dipak K
Peddecord, K Michael
Nagel, Anna C
Allen, Christopher
Yang, Jiue-An
Lindsay, Suzanne
author_sort Aslam, Anoshé A
collection PubMed
description BACKGROUND: Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surveillance due to the availability of large amounts of data. In this study, tweets, or posts of 140 characters or less, from the website Twitter were collected and analyzed for their potential as surveillance for seasonal influenza. OBJECTIVE: There were three aims: (1) to improve the correlation of tweets to sentinel-provided influenza-like illness (ILI) rates by city through filtering and a machine-learning classifier, (2) to observe correlations of tweets for emergency department ILI rates by city, and (3) to explore correlations for tweets to laboratory-confirmed influenza cases in San Diego. METHODS: Tweets containing the keyword “flu” were collected within a 17-mile radius from 11 US cities selected for population and availability of ILI data. At the end of the collection period, 159,802 tweets were used for correlation analyses with sentinel-provided ILI and emergency department ILI rates as reported by the corresponding city or county health department. Two separate methods were used to observe correlations between tweets and ILI rates: filtering the tweets by type (non-retweets, retweets, tweets with a URL, tweets without a URL), and the use of a machine-learning classifier that determined whether a tweet was “valid”, or from a user who was likely ill with the flu. RESULTS: Correlations varied by city but general trends were observed. Non-retweets and tweets without a URL had higher and more significant (P<.05) correlations than retweets and tweets with a URL. Correlations of tweets to emergency department ILI rates were higher than the correlations observed for sentinel-provided ILI for most of the cities. The machine-learning classifier yielded the highest correlations for many of the cities when using the sentinel-provided or emergency department ILI as well as the number of laboratory-confirmed influenza cases in San Diego. High correlation values (r=.93) with significance at P<.001 were observed for laboratory-confirmed influenza cases for most categories and tweets determined to be valid by the classifier. CONCLUSIONS: Compared to tweet analyses in the previous influenza season, this study demonstrated increased accuracy in using Twitter as a supplementary surveillance tool for influenza as better filtering and classification methods yielded higher correlations for the 2013-2014 influenza season than those found for tweets in the previous influenza season, where emergency department ILI rates were better correlated to tweets than sentinel-provided ILI rates. Further investigations in the field would require expansion with regard to the location that the tweets are collected from, as well as the availability of more ILI data.
format Online
Article
Text
id pubmed-4260066
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher JMIR Publications Inc.
record_format MEDLINE/PubMed
spelling pubmed-42600662014-12-10 The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance Aslam, Anoshé A Tsou, Ming-Hsiang Spitzberg, Brian H An, Li Gawron, J Mark Gupta, Dipak K Peddecord, K Michael Nagel, Anna C Allen, Christopher Yang, Jiue-An Lindsay, Suzanne J Med Internet Res Original Paper BACKGROUND: Existing influenza surveillance in the United States is focused on the collection of data from sentinel physicians and hospitals; however, the compilation and distribution of reports are usually delayed by up to 2 weeks. With the popularity of social media growing, the Internet is a source for syndromic surveillance due to the availability of large amounts of data. In this study, tweets, or posts of 140 characters or less, from the website Twitter were collected and analyzed for their potential as surveillance for seasonal influenza. OBJECTIVE: There were three aims: (1) to improve the correlation of tweets to sentinel-provided influenza-like illness (ILI) rates by city through filtering and a machine-learning classifier, (2) to observe correlations of tweets for emergency department ILI rates by city, and (3) to explore correlations for tweets to laboratory-confirmed influenza cases in San Diego. METHODS: Tweets containing the keyword “flu” were collected within a 17-mile radius from 11 US cities selected for population and availability of ILI data. At the end of the collection period, 159,802 tweets were used for correlation analyses with sentinel-provided ILI and emergency department ILI rates as reported by the corresponding city or county health department. Two separate methods were used to observe correlations between tweets and ILI rates: filtering the tweets by type (non-retweets, retweets, tweets with a URL, tweets without a URL), and the use of a machine-learning classifier that determined whether a tweet was “valid”, or from a user who was likely ill with the flu. RESULTS: Correlations varied by city but general trends were observed. Non-retweets and tweets without a URL had higher and more significant (P<.05) correlations than retweets and tweets with a URL. Correlations of tweets to emergency department ILI rates were higher than the correlations observed for sentinel-provided ILI for most of the cities. The machine-learning classifier yielded the highest correlations for many of the cities when using the sentinel-provided or emergency department ILI as well as the number of laboratory-confirmed influenza cases in San Diego. High correlation values (r=.93) with significance at P<.001 were observed for laboratory-confirmed influenza cases for most categories and tweets determined to be valid by the classifier. CONCLUSIONS: Compared to tweet analyses in the previous influenza season, this study demonstrated increased accuracy in using Twitter as a supplementary surveillance tool for influenza as better filtering and classification methods yielded higher correlations for the 2013-2014 influenza season than those found for tweets in the previous influenza season, where emergency department ILI rates were better correlated to tweets than sentinel-provided ILI rates. Further investigations in the field would require expansion with regard to the location that the tweets are collected from, as well as the availability of more ILI data. JMIR Publications Inc. 2014-11-14 /pmc/articles/PMC4260066/ /pubmed/25406040 http://dx.doi.org/10.2196/jmir.3532 Text en ©Anoshé A Aslam, Ming-Hsiang Tsou, Brian H Spitzberg, Li An, J Mark Gawron, Dipak K Gupta, K Michael Peddecord, Anna C Nagel, Christopher Allen, Jiue-An Yang, Suzanne Lindsay. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 14.11.2014. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Aslam, Anoshé A
Tsou, Ming-Hsiang
Spitzberg, Brian H
An, Li
Gawron, J Mark
Gupta, Dipak K
Peddecord, K Michael
Nagel, Anna C
Allen, Christopher
Yang, Jiue-An
Lindsay, Suzanne
The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance
title The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance
title_full The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance
title_fullStr The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance
title_full_unstemmed The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance
title_short The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance
title_sort reliability of tweets as a supplementary method of seasonal influenza surveillance
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4260066/
https://www.ncbi.nlm.nih.gov/pubmed/25406040
http://dx.doi.org/10.2196/jmir.3532
work_keys_str_mv AT aslamanoshea thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT tsouminghsiang thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT spitzbergbrianh thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT anli thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT gawronjmark thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT guptadipakk thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT peddecordkmichael thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT nagelannac thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT allenchristopher thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT yangjiuean thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT lindsaysuzanne thereliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT aslamanoshea reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT tsouminghsiang reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT spitzbergbrianh reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT anli reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT gawronjmark reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT guptadipakk reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT peddecordkmichael reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT nagelannac reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT allenchristopher reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT yangjiuean reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance
AT lindsaysuzanne reliabilityoftweetsasasupplementarymethodofseasonalinfluenzasurveillance