Cargando…

Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation

BACKGROUND: The potential to harness the plurality of available data in real time along with advanced data analytics for the accurate prediction of influenza-like illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniqu...

Descripción completa

Detalles Bibliográficos
Autores principales: Athanasiou, Maria, Fragkozidis, Georgios, Zarkogianni, Konstantia, Nikita, Konstantina S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9941907/
https://www.ncbi.nlm.nih.gov/pubmed/36745490
http://dx.doi.org/10.2196/42519
_version_ 1784891386358661120
author Athanasiou, Maria
Fragkozidis, Georgios
Zarkogianni, Konstantia
Nikita, Konstantina S
author_facet Athanasiou, Maria
Fragkozidis, Georgios
Zarkogianni, Konstantia
Nikita, Konstantina S
author_sort Athanasiou, Maria
collection PubMed
description BACKGROUND: The potential to harness the plurality of available data in real time along with advanced data analytics for the accurate prediction of influenza-like illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniques and traditional and alternative data sources, such as ILI surveillance reports, weather reports, search engine queries, and social media, have been explored with the ultimate goal of being used in the development of electronic surveillance systems that could complement existing monitoring resources. OBJECTIVE: The scope of this study was to investigate for the first time the combined use of ILI surveillance data, weather data, and Twitter data along with deep learning techniques toward the development of prediction models able to nowcast and forecast weekly ILI cases. By assessing the predictive power of both traditional and alternative data sources on the use case of ILI, this study aimed to provide a novel approach for corroborating evidence and enhancing accuracy and reliability in the surveillance of infectious diseases. METHODS: The model’s input space consisted of information related to weekly ILI surveillance, web-based social (eg, Twitter) behavior, and weather conditions. For the design and development of the model, relevant data corresponding to the period of 2010 to 2019 and focusing on the Greek population and weather were collected. Long short-term memory (LSTM) neural networks were leveraged to efficiently handle the sequential and nonlinear nature of the multitude of collected data. The 3 data categories were first used separately for training 3 LSTM-based primary models. Subsequently, different transfer learning (TL) approaches were explored with the aim of creating various feature spaces combining the features extracted from the corresponding primary models’ LSTM layers for the latter to feed a dense layer. RESULTS: The primary model that learned from weather data yielded better forecast accuracy (root mean square error [RMSE]=0.144; Pearson correlation coefficient [PCC]=0.801) than the model trained with ILI historical data (RMSE=0.159; PCC=0.794). The best performance was achieved by the TL-based model leveraging the combination of the 3 data categories (RMSE=0.128; PCC=0.822). CONCLUSIONS: The superiority of the TL-based model, which considers Twitter data, weather data, and ILI surveillance data, reflects the potential of alternative public sources to enhance accurate and reliable prediction of ILI spread. Despite its focus on the use case of Greece, the proposed approach can be generalized to other locations, populations, and social media platforms to support the surveillance of infectious diseases with the ultimate goal of reinforcing preparedness for future epidemics.
format Online
Article
Text
id pubmed-9941907
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-99419072023-02-22 Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation Athanasiou, Maria Fragkozidis, Georgios Zarkogianni, Konstantia Nikita, Konstantina S J Med Internet Res Original Paper BACKGROUND: The potential to harness the plurality of available data in real time along with advanced data analytics for the accurate prediction of influenza-like illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniques and traditional and alternative data sources, such as ILI surveillance reports, weather reports, search engine queries, and social media, have been explored with the ultimate goal of being used in the development of electronic surveillance systems that could complement existing monitoring resources. OBJECTIVE: The scope of this study was to investigate for the first time the combined use of ILI surveillance data, weather data, and Twitter data along with deep learning techniques toward the development of prediction models able to nowcast and forecast weekly ILI cases. By assessing the predictive power of both traditional and alternative data sources on the use case of ILI, this study aimed to provide a novel approach for corroborating evidence and enhancing accuracy and reliability in the surveillance of infectious diseases. METHODS: The model’s input space consisted of information related to weekly ILI surveillance, web-based social (eg, Twitter) behavior, and weather conditions. For the design and development of the model, relevant data corresponding to the period of 2010 to 2019 and focusing on the Greek population and weather were collected. Long short-term memory (LSTM) neural networks were leveraged to efficiently handle the sequential and nonlinear nature of the multitude of collected data. The 3 data categories were first used separately for training 3 LSTM-based primary models. Subsequently, different transfer learning (TL) approaches were explored with the aim of creating various feature spaces combining the features extracted from the corresponding primary models’ LSTM layers for the latter to feed a dense layer. RESULTS: The primary model that learned from weather data yielded better forecast accuracy (root mean square error [RMSE]=0.144; Pearson correlation coefficient [PCC]=0.801) than the model trained with ILI historical data (RMSE=0.159; PCC=0.794). The best performance was achieved by the TL-based model leveraging the combination of the 3 data categories (RMSE=0.128; PCC=0.822). CONCLUSIONS: The superiority of the TL-based model, which considers Twitter data, weather data, and ILI surveillance data, reflects the potential of alternative public sources to enhance accurate and reliable prediction of ILI spread. Despite its focus on the use case of Greece, the proposed approach can be generalized to other locations, populations, and social media platforms to support the surveillance of infectious diseases with the ultimate goal of reinforcing preparedness for future epidemics. JMIR Publications 2023-02-06 /pmc/articles/PMC9941907/ /pubmed/36745490 http://dx.doi.org/10.2196/42519 Text en ©Maria Athanasiou, Georgios Fragkozidis, Konstantia Zarkogianni, Konstantina S Nikita. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 06.02.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Athanasiou, Maria
Fragkozidis, Georgios
Zarkogianni, Konstantia
Nikita, Konstantina S
Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation
title Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation
title_full Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation
title_fullStr Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation
title_full_unstemmed Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation
title_short Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation
title_sort long short-term memory–based prediction of the spread of influenza-like illness leveraging surveillance, weather, and twitter data: model development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9941907/
https://www.ncbi.nlm.nih.gov/pubmed/36745490
http://dx.doi.org/10.2196/42519
work_keys_str_mv AT athanasioumaria longshorttermmemorybasedpredictionofthespreadofinfluenzalikeillnessleveragingsurveillanceweatherandtwitterdatamodeldevelopmentandvalidation
AT fragkozidisgeorgios longshorttermmemorybasedpredictionofthespreadofinfluenzalikeillnessleveragingsurveillanceweatherandtwitterdatamodeldevelopmentandvalidation
AT zarkogiannikonstantia longshorttermmemorybasedpredictionofthespreadofinfluenzalikeillnessleveragingsurveillanceweatherandtwitterdatamodeldevelopmentandvalidation
AT nikitakonstantinas longshorttermmemorybasedpredictionofthespreadofinfluenzalikeillnessleveragingsurveillanceweatherandtwitterdatamodeldevelopmentandvalidation