Cargando…

Improving Google Flu Trends for COVID-19 estimates using Weibo posts

While incomplete non-medical data has been integrated into prediction models for epidemics, the accuracy and the generalizability of the data are difficult to guarantee. To comprehensively evaluate the ability and applicability of using social media data to predict the development of COVID-19, a new...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Shuhui, Fang, Fan, Zhou, Tao, Zhang, Wei, Guo, Qiang, Zeng, Rui, Chen, Xiaohong, Liu, Jianguo, Lu, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Xi'an Jiaotong University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8280378/
http://dx.doi.org/10.1016/j.dsm.2021.07.001
_version_ 1783722635357585408
author Guo, Shuhui
Fang, Fan
Zhou, Tao
Zhang, Wei
Guo, Qiang
Zeng, Rui
Chen, Xiaohong
Liu, Jianguo
Lu, Xin
author_facet Guo, Shuhui
Fang, Fan
Zhou, Tao
Zhang, Wei
Guo, Qiang
Zeng, Rui
Chen, Xiaohong
Liu, Jianguo
Lu, Xin
author_sort Guo, Shuhui
collection PubMed
description While incomplete non-medical data has been integrated into prediction models for epidemics, the accuracy and the generalizability of the data are difficult to guarantee. To comprehensively evaluate the ability and applicability of using social media data to predict the development of COVID-19, a new confirmed case prediction algorithm improving the Google Flu Trends algorithm is established, called Weibo COVID-19 Trends (WCT), based on the post dataset generated by all users in Wuhan on Sina Weibo. A genetic algorithm is designed to select the keyword set for filtering COVID-19 related posts. WCT can constantly outperform the highest average test score in the training set between daily new confirmed case counts and the prediction results. It remains to produce the best prediction results among other algorithms when the number of forecast days increases from one to eight days with the highest correlation score from 0.98 (P ​< 0.01) to 0.86 (P ​< 0.01) during all analysis period. Additionally, WCT effectively improves the Google Flu Trends algorithm's shortcoming of overestimating the epidemic peak value. This study offers a highly adaptive approach for feature engineering of third-party data in epidemic prediction, providing useful insights for the prediction of newly emerging infectious diseases at an early stage.
format Online
Article
Text
id pubmed-8280378
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Xi'an Jiaotong University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd.
record_format MEDLINE/PubMed
spelling pubmed-82803782021-07-20 Improving Google Flu Trends for COVID-19 estimates using Weibo posts Guo, Shuhui Fang, Fan Zhou, Tao Zhang, Wei Guo, Qiang Zeng, Rui Chen, Xiaohong Liu, Jianguo Lu, Xin Data Science and Management Research Article While incomplete non-medical data has been integrated into prediction models for epidemics, the accuracy and the generalizability of the data are difficult to guarantee. To comprehensively evaluate the ability and applicability of using social media data to predict the development of COVID-19, a new confirmed case prediction algorithm improving the Google Flu Trends algorithm is established, called Weibo COVID-19 Trends (WCT), based on the post dataset generated by all users in Wuhan on Sina Weibo. A genetic algorithm is designed to select the keyword set for filtering COVID-19 related posts. WCT can constantly outperform the highest average test score in the training set between daily new confirmed case counts and the prediction results. It remains to produce the best prediction results among other algorithms when the number of forecast days increases from one to eight days with the highest correlation score from 0.98 (P ​< 0.01) to 0.86 (P ​< 0.01) during all analysis period. Additionally, WCT effectively improves the Google Flu Trends algorithm's shortcoming of overestimating the epidemic peak value. This study offers a highly adaptive approach for feature engineering of third-party data in epidemic prediction, providing useful insights for the prediction of newly emerging infectious diseases at an early stage. Xi'an Jiaotong University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. 2021-09 2021-07-15 /pmc/articles/PMC8280378/ http://dx.doi.org/10.1016/j.dsm.2021.07.001 Text en © 2021 Xi'an Jiaotong University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Research Article
Guo, Shuhui
Fang, Fan
Zhou, Tao
Zhang, Wei
Guo, Qiang
Zeng, Rui
Chen, Xiaohong
Liu, Jianguo
Lu, Xin
Improving Google Flu Trends for COVID-19 estimates using Weibo posts
title Improving Google Flu Trends for COVID-19 estimates using Weibo posts
title_full Improving Google Flu Trends for COVID-19 estimates using Weibo posts
title_fullStr Improving Google Flu Trends for COVID-19 estimates using Weibo posts
title_full_unstemmed Improving Google Flu Trends for COVID-19 estimates using Weibo posts
title_short Improving Google Flu Trends for COVID-19 estimates using Weibo posts
title_sort improving google flu trends for covid-19 estimates using weibo posts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8280378/
http://dx.doi.org/10.1016/j.dsm.2021.07.001
work_keys_str_mv AT guoshuhui improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT fangfan improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT zhoutao improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT zhangwei improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT guoqiang improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT zengrui improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT chenxiaohong improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT liujianguo improvinggoogleflutrendsforcovid19estimatesusingweiboposts
AT luxin improvinggoogleflutrendsforcovid19estimatesusingweiboposts