Cargando…
Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries
BACKGROUND: The SARS-COV-2 virus and its variants pose extraordinary challenges for public health worldwide. Timely and accurate forecasting of the COVID-19 epidemic is key to sustaining interventions and policies and efficient resource allocation. Internet-based data sources have shown great potent...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9205424/ https://www.ncbi.nlm.nih.gov/pubmed/35507921 http://dx.doi.org/10.2196/35266 |
_version_ | 1784729128084176896 |
---|---|
author | Li, Jingwei Huang, Wei Sia, Choon Ling Chen, Zhuo Wu, Tailai Wang, Qingnan |
author_facet | Li, Jingwei Huang, Wei Sia, Choon Ling Chen, Zhuo Wu, Tailai Wang, Qingnan |
author_sort | Li, Jingwei |
collection | PubMed |
description | BACKGROUND: The SARS-COV-2 virus and its variants pose extraordinary challenges for public health worldwide. Timely and accurate forecasting of the COVID-19 epidemic is key to sustaining interventions and policies and efficient resource allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs but did not take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored. OBJECTIVE: The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources. METHODS: We first used core terms and symptom-related keyword-based methods to extract COVID-19–related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used lagged Pearson correlations for COVID-19 forecasting timeliness analysis. RESULTS: Our proposed model achieved the highest accuracy in all 5 accuracy measures, compared with all the baseline models of both Hubei province and the rest of mainland China. In mainland China, except for Hubei, the COVID-19 epidemic forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t(198)=–8.722, P<.001; model 2, t(198)=–5.000, P<.001, model 3, t(198)=–1.882, P=.06; model 4, t(198)=–4.644, P<.001; model 5, t(198)=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical new confirmed COVID-19 case counts only (model 1, t(198)=–1.732, P=.09). Our results also showed that Internet-based sources could provide a 2- to 6-day earlier warning for COVID-19 outbreaks. CONCLUSIONS: Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for epidemics of COVID-19 and its variants, which may help improve public health agencies' interventions and resource allocation in mitigating and controlling new waves of COVID-19 or other relevant epidemics. |
format | Online Article Text |
id | pubmed-9205424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-92054242022-06-18 Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries Li, Jingwei Huang, Wei Sia, Choon Ling Chen, Zhuo Wu, Tailai Wang, Qingnan JMIR Public Health Surveill Original Paper BACKGROUND: The SARS-COV-2 virus and its variants pose extraordinary challenges for public health worldwide. Timely and accurate forecasting of the COVID-19 epidemic is key to sustaining interventions and policies and efficient resource allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs but did not take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored. OBJECTIVE: The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources. METHODS: We first used core terms and symptom-related keyword-based methods to extract COVID-19–related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used lagged Pearson correlations for COVID-19 forecasting timeliness analysis. RESULTS: Our proposed model achieved the highest accuracy in all 5 accuracy measures, compared with all the baseline models of both Hubei province and the rest of mainland China. In mainland China, except for Hubei, the COVID-19 epidemic forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t(198)=–8.722, P<.001; model 2, t(198)=–5.000, P<.001, model 3, t(198)=–1.882, P=.06; model 4, t(198)=–4.644, P<.001; model 5, t(198)=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical new confirmed COVID-19 case counts only (model 1, t(198)=–1.732, P=.09). Our results also showed that Internet-based sources could provide a 2- to 6-day earlier warning for COVID-19 outbreaks. CONCLUSIONS: Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for epidemics of COVID-19 and its variants, which may help improve public health agencies' interventions and resource allocation in mitigating and controlling new waves of COVID-19 or other relevant epidemics. JMIR Publications 2022-06-16 /pmc/articles/PMC9205424/ /pubmed/35507921 http://dx.doi.org/10.2196/35266 Text en ©Jingwei Li, Wei Huang, Choon Ling Sia, Zhuo Chen, Tailai Wu, Qingnan Wang. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 16.06.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Li, Jingwei Huang, Wei Sia, Choon Ling Chen, Zhuo Wu, Tailai Wang, Qingnan Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries |
title | Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries |
title_full | Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries |
title_fullStr | Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries |
title_full_unstemmed | Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries |
title_short | Enhancing COVID-19 Epidemic Forecasting Accuracy by Combining Real-time and Historical Data From Multiple Internet-Based Sources: Analysis of Social Media Data, Online News Articles, and Search Queries |
title_sort | enhancing covid-19 epidemic forecasting accuracy by combining real-time and historical data from multiple internet-based sources: analysis of social media data, online news articles, and search queries |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9205424/ https://www.ncbi.nlm.nih.gov/pubmed/35507921 http://dx.doi.org/10.2196/35266 |
work_keys_str_mv | AT lijingwei enhancingcovid19epidemicforecastingaccuracybycombiningrealtimeandhistoricaldatafrommultipleinternetbasedsourcesanalysisofsocialmediadataonlinenewsarticlesandsearchqueries AT huangwei enhancingcovid19epidemicforecastingaccuracybycombiningrealtimeandhistoricaldatafrommultipleinternetbasedsourcesanalysisofsocialmediadataonlinenewsarticlesandsearchqueries AT siachoonling enhancingcovid19epidemicforecastingaccuracybycombiningrealtimeandhistoricaldatafrommultipleinternetbasedsourcesanalysisofsocialmediadataonlinenewsarticlesandsearchqueries AT chenzhuo enhancingcovid19epidemicforecastingaccuracybycombiningrealtimeandhistoricaldatafrommultipleinternetbasedsourcesanalysisofsocialmediadataonlinenewsarticlesandsearchqueries AT wutailai enhancingcovid19epidemicforecastingaccuracybycombiningrealtimeandhistoricaldatafrommultipleinternetbasedsourcesanalysisofsocialmediadataonlinenewsarticlesandsearchqueries AT wangqingnan enhancingcovid19epidemicforecastingaccuracybycombiningrealtimeandhistoricaldatafrommultipleinternetbasedsourcesanalysisofsocialmediadataonlinenewsarticlesandsearchqueries |