Cargando…

Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance

We present a machine learning-based methodology capable of providing real-time (“nowcast”) and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a par...

Descripción completa

Detalles Bibliográficos
Autores principales: Santillana, Mauricio, Nguyen, André T., Dredze, Mark, Paul, Michael J., Nsoesie, Elaine O., Brownstein, John S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4626021/
https://www.ncbi.nlm.nih.gov/pubmed/26513245
http://dx.doi.org/10.1371/journal.pcbi.1004513
_version_ 1782398059185963008
author Santillana, Mauricio
Nguyen, André T.
Dredze, Mark
Paul, Michael J.
Nsoesie, Elaine O.
Brownstein, John S.
author_facet Santillana, Mauricio
Nguyen, André T.
Dredze, Mark
Paul, Michael J.
Nsoesie, Elaine O.
Brownstein, John S.
author_sort Santillana, Mauricio
collection PubMed
description We present a machine learning-based methodology capable of providing real-time (“nowcast”) and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC’s ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013–2014 (retrospective) and 2014–2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method’s predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT’s real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.
format Online
Article
Text
id pubmed-4626021
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46260212015-11-06 Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance Santillana, Mauricio Nguyen, André T. Dredze, Mark Paul, Michael J. Nsoesie, Elaine O. Brownstein, John S. PLoS Comput Biol Research Article We present a machine learning-based methodology capable of providing real-time (“nowcast”) and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC’s ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013–2014 (retrospective) and 2014–2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method’s predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT’s real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons. Public Library of Science 2015-10-29 /pmc/articles/PMC4626021/ /pubmed/26513245 http://dx.doi.org/10.1371/journal.pcbi.1004513 Text en © 2015 Santillana et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Santillana, Mauricio
Nguyen, André T.
Dredze, Mark
Paul, Michael J.
Nsoesie, Elaine O.
Brownstein, John S.
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
title Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
title_full Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
title_fullStr Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
title_full_unstemmed Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
title_short Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
title_sort combining search, social media, and traditional data sources to improve influenza surveillance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4626021/
https://www.ncbi.nlm.nih.gov/pubmed/26513245
http://dx.doi.org/10.1371/journal.pcbi.1004513
work_keys_str_mv AT santillanamauricio combiningsearchsocialmediaandtraditionaldatasourcestoimproveinfluenzasurveillance
AT nguyenandret combiningsearchsocialmediaandtraditionaldatasourcestoimproveinfluenzasurveillance
AT dredzemark combiningsearchsocialmediaandtraditionaldatasourcestoimproveinfluenzasurveillance
AT paulmichaelj combiningsearchsocialmediaandtraditionaldatasourcestoimproveinfluenzasurveillance
AT nsoesieelaineo combiningsearchsocialmediaandtraditionaldatasourcestoimproveinfluenzasurveillance
AT brownsteinjohns combiningsearchsocialmediaandtraditionaldatasourcestoimproveinfluenzasurveillance