Cargando…

Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning

BACKGROUND: Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. METHODS: The “inter...

Descripción completa

Detalles Bibliográficos
Autores principales:	Peng, Yuanyuan, Li, Cuilian, Rong, Yibiao, Chen, Xinjian, Chen, Haoyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	International Society of Global Health 2020
Materias:	Research Theme 1: COVID-19 Pandemic
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7567446/ https://www.ncbi.nlm.nih.gov/pubmed/33110594 http://dx.doi.org/10.7189/jogh.10.020511

_version_	1783596328167669760
author	Peng, Yuanyuan Li, Cuilian Rong, Yibiao Chen, Xinjian Chen, Haoyu
author_facet	Peng, Yuanyuan Li, Cuilian Rong, Yibiao Chen, Xinjian Chen, Haoyu
author_sort	Peng, Yuanyuan
collection	PubMed
description	BACKGROUND: Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. METHODS: The “interest over time” and “interest by region” Google Trends data of Coronavirus, pneumonia, and six COVID symptom-related terms were searched. The daily incidence of COVID-19 from 10 January to 23 April 2020 of 202 countries was retrieved from the World Health Organization. Three alert levels were defined. Ten weeks' data from 20 countries were used for training with machine learning algorithms. The features were selected according to the correlation and importance. The model was then tested on 2830 samples of 202 countries. RESULTS: Our model performed well in 154 (76.2%) countries, of which each had no more than four misclassified samples. In these 154 countries, the accuracy was 0.8133, and the kappa coefficient was 0.6828. While in all 202 countries, the accuracy was 0.7527, and the kappa coefficient was 0.5841. The proposed algorithm based on Random Forest Classification and nine features performed better compared to other machine learning methods and the models with different numbers of features. CONCLUSIONS: Our result suggested that the model developed from 20 countries with Google Trends data and Random Forest Classification can be applied to predict the epidemic alert levels of most countries worldwide.
format	Online Article Text
id	pubmed-7567446
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	International Society of Global Health
record_format	MEDLINE/PubMed
spelling	pubmed-75674462020-10-21 Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning Peng, Yuanyuan Li, Cuilian Rong, Yibiao Chen, Xinjian Chen, Haoyu J Glob Health Research Theme 1: COVID-19 Pandemic BACKGROUND: Internet search engine data, such as Google Trends, was shown to be correlated with the incidence of COVID-19, but only in several countries. We aim to develop a model from a small number of countries to predict the epidemic alert level in all the countries worldwide. METHODS: The “interest over time” and “interest by region” Google Trends data of Coronavirus, pneumonia, and six COVID symptom-related terms were searched. The daily incidence of COVID-19 from 10 January to 23 April 2020 of 202 countries was retrieved from the World Health Organization. Three alert levels were defined. Ten weeks' data from 20 countries were used for training with machine learning algorithms. The features were selected according to the correlation and importance. The model was then tested on 2830 samples of 202 countries. RESULTS: Our model performed well in 154 (76.2%) countries, of which each had no more than four misclassified samples. In these 154 countries, the accuracy was 0.8133, and the kappa coefficient was 0.6828. While in all 202 countries, the accuracy was 0.7527, and the kappa coefficient was 0.5841. The proposed algorithm based on Random Forest Classification and nine features performed better compared to other machine learning methods and the models with different numbers of features. CONCLUSIONS: Our result suggested that the model developed from 20 countries with Google Trends data and Random Forest Classification can be applied to predict the epidemic alert levels of most countries worldwide. International Society of Global Health 2020-12 2020-09-23 /pmc/articles/PMC7567446/ /pubmed/33110594 http://dx.doi.org/10.7189/jogh.10.020511 Text en Copyright © 2020 by the Journal of Global Health. All rights reserved. http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License.
spellingShingle	Research Theme 1: COVID-19 Pandemic Peng, Yuanyuan Li, Cuilian Rong, Yibiao Chen, Xinjian Chen, Haoyu Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
title	Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
title_full	Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
title_fullStr	Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
title_full_unstemmed	Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
title_short	Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning
title_sort	retrospective analysis of the accuracy of predicting the alert level of covid-19 in 202 countries using google trends and machine learning
topic	Research Theme 1: COVID-19 Pandemic
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7567446/ https://www.ncbi.nlm.nih.gov/pubmed/33110594 http://dx.doi.org/10.7189/jogh.10.020511
work_keys_str_mv	AT pengyuanyuan retrospectiveanalysisoftheaccuracyofpredictingthealertlevelofcovid19in202countriesusinggoogletrendsandmachinelearning AT licuilian retrospectiveanalysisoftheaccuracyofpredictingthealertlevelofcovid19in202countriesusinggoogletrendsandmachinelearning AT rongyibiao retrospectiveanalysisoftheaccuracyofpredictingthealertlevelofcovid19in202countriesusinggoogletrendsandmachinelearning AT chenxinjian retrospectiveanalysisoftheaccuracyofpredictingthealertlevelofcovid19in202countriesusinggoogletrendsandmachinelearning AT chenhaoyu retrospectiveanalysisoftheaccuracyofpredictingthealertlevelofcovid19in202countriesusinggoogletrendsandmachinelearning

Retrospective analysis of the accuracy of predicting the alert level of COVID-19 in 202 countries using Google Trends and machine learning

Ejemplares similares