Cargando…

Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study

OBJECTIVES: This study examined Twitter for public health surveillance during a mass gathering in Canada with two objectives: to explore the feasibility of acquiring, categorizing and using geolocated Twitter data and to compare Twitter data against other data sources used for Pan Parapan American G...

Descripción completa

Detalles Bibliográficos
Autores principales: Khan, Yasmin, Leung, Garvin J., Belanger, Paul, Gournis, Effie, Buckeridge, David L., Liu, Li, Li, Ye, Johnson, Ian L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6964588/
https://www.ncbi.nlm.nih.gov/pubmed/29981081
http://dx.doi.org/10.17269/s41997-018-0059-0
_version_ 1783488489730342912
author Khan, Yasmin
Leung, Garvin J.
Belanger, Paul
Gournis, Effie
Buckeridge, David L.
Liu, Li
Li, Ye
Johnson, Ian L.
author_facet Khan, Yasmin
Leung, Garvin J.
Belanger, Paul
Gournis, Effie
Buckeridge, David L.
Liu, Li
Li, Ye
Johnson, Ian L.
author_sort Khan, Yasmin
collection PubMed
description OBJECTIVES: This study examined Twitter for public health surveillance during a mass gathering in Canada with two objectives: to explore the feasibility of acquiring, categorizing and using geolocated Twitter data and to compare Twitter data against other data sources used for Pan Parapan American Games (P/PAG) surveillance. METHODS: Syndrome definitions were created using keyword categorization to extract posts from Twitter. Categories were developed iteratively for four relevant syndromes: respiratory, gastrointestinal, heat-related illness, and influenza-like illness (ILI). All data sources corresponded to the location of Toronto, Canada. Twitter data were acquired from a publicly available stream representing a 1% random sample of tweets from June 26 to September 10, 2015. Cross-correlation analyses of time series data were conducted between Twitter and comparator surveillance data sources: emergency department visits, telephone helpline calls, laboratory testing positivity rate, reportable disease data, and temperature. RESULTS: The frequency of daily tweets that were classified into syndromes was low, with the highest mean number of daily tweets being for ILI and respiratory syndromes (22.0 and 21.6, respectively) and the lowest, for the heat syndrome (4.1). Cross-correlation analyses of Twitter data demonstrated significant correlations for heat syndrome with two data sources: telephone helpline calls (r = 0.4) and temperature data (r = 0.5). CONCLUSION: Using simple syndromes based on keyword classification of geolocated tweets, we found a correlation between tweets and two routine data sources for heat alerts, the only public health event detected during P/PAG. Further research is needed to understand the role for Twitter in surveillance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.17269/s41997-018-0059-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6964588
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-69645882020-02-04 Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study Khan, Yasmin Leung, Garvin J. Belanger, Paul Gournis, Effie Buckeridge, David L. Liu, Li Li, Ye Johnson, Ian L. Can J Public Health Quantitative Research OBJECTIVES: This study examined Twitter for public health surveillance during a mass gathering in Canada with two objectives: to explore the feasibility of acquiring, categorizing and using geolocated Twitter data and to compare Twitter data against other data sources used for Pan Parapan American Games (P/PAG) surveillance. METHODS: Syndrome definitions were created using keyword categorization to extract posts from Twitter. Categories were developed iteratively for four relevant syndromes: respiratory, gastrointestinal, heat-related illness, and influenza-like illness (ILI). All data sources corresponded to the location of Toronto, Canada. Twitter data were acquired from a publicly available stream representing a 1% random sample of tweets from June 26 to September 10, 2015. Cross-correlation analyses of time series data were conducted between Twitter and comparator surveillance data sources: emergency department visits, telephone helpline calls, laboratory testing positivity rate, reportable disease data, and temperature. RESULTS: The frequency of daily tweets that were classified into syndromes was low, with the highest mean number of daily tweets being for ILI and respiratory syndromes (22.0 and 21.6, respectively) and the lowest, for the heat syndrome (4.1). Cross-correlation analyses of Twitter data demonstrated significant correlations for heat syndrome with two data sources: telephone helpline calls (r = 0.4) and temperature data (r = 0.5). CONCLUSION: Using simple syndromes based on keyword classification of geolocated tweets, we found a correlation between tweets and two routine data sources for heat alerts, the only public health event detected during P/PAG. Further research is needed to understand the role for Twitter in surveillance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.17269/s41997-018-0059-0) contains supplementary material, which is available to authorized users. Springer International Publishing 2018-04-20 /pmc/articles/PMC6964588/ /pubmed/29981081 http://dx.doi.org/10.17269/s41997-018-0059-0 Text en © The Canadian Public Health Association 2018
spellingShingle Quantitative Research
Khan, Yasmin
Leung, Garvin J.
Belanger, Paul
Gournis, Effie
Buckeridge, David L.
Liu, Li
Li, Ye
Johnson, Ian L.
Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study
title Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study
title_full Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study
title_fullStr Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study
title_full_unstemmed Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study
title_short Comparing Twitter data to routine data sources in public health surveillance for the 2015 Pan/Parapan American Games: an ecological study
title_sort comparing twitter data to routine data sources in public health surveillance for the 2015 pan/parapan american games: an ecological study
topic Quantitative Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6964588/
https://www.ncbi.nlm.nih.gov/pubmed/29981081
http://dx.doi.org/10.17269/s41997-018-0059-0
work_keys_str_mv AT khanyasmin comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT leunggarvinj comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT belangerpaul comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT gourniseffie comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT buckeridgedavidl comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT liuli comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT liye comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy
AT johnsonianl comparingtwitterdatatoroutinedatasourcesinpublichealthsurveillanceforthe2015panparapanamericangamesanecologicalstudy