Cargando…

Analysing Twitter and web queries for flu trend prediction

BACKGROUND: Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluate...

Descripción completa

Detalles Bibliográficos
Autores principales: Santos, José Carlos, Matos, Sérgio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108891/
https://www.ncbi.nlm.nih.gov/pubmed/25077431
http://dx.doi.org/10.1186/1742-4682-11-S1-S6
_version_ 1782327805690773504
author Santos, José Carlos
Matos, Sérgio
author_facet Santos, José Carlos
Matos, Sérgio
author_sort Santos, José Carlos
collection PubMed
description BACKGROUND: Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. RESULTS: Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (p < 0.001). These classification and regression models were also applied to estimate the flu incidence in the following flu season, achieving a correlation of 0.72. CONCLUSIONS: Previous studies addressing the estimation of disease incidence based on user-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results.
format Online
Article
Text
id pubmed-4108891
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41088912014-08-04 Analysing Twitter and web queries for flu trend prediction Santos, José Carlos Matos, Sérgio Theor Biol Med Model Research BACKGROUND: Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. RESULTS: Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (p < 0.001). These classification and regression models were also applied to estimate the flu incidence in the following flu season, achieving a correlation of 0.72. CONCLUSIONS: Previous studies addressing the estimation of disease incidence based on user-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results. BioMed Central 2014-05-07 /pmc/articles/PMC4108891/ /pubmed/25077431 http://dx.doi.org/10.1186/1742-4682-11-S1-S6 Text en Copyright © 2014 Santos and Matos; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Santos, José Carlos
Matos, Sérgio
Analysing Twitter and web queries for flu trend prediction
title Analysing Twitter and web queries for flu trend prediction
title_full Analysing Twitter and web queries for flu trend prediction
title_fullStr Analysing Twitter and web queries for flu trend prediction
title_full_unstemmed Analysing Twitter and web queries for flu trend prediction
title_short Analysing Twitter and web queries for flu trend prediction
title_sort analysing twitter and web queries for flu trend prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108891/
https://www.ncbi.nlm.nih.gov/pubmed/25077431
http://dx.doi.org/10.1186/1742-4682-11-S1-S6
work_keys_str_mv AT santosjosecarlos analysingtwitterandwebqueriesforflutrendprediction
AT matossergio analysingtwitterandwebqueriesforflutrendprediction