Cargando…

Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

OBJECTIVE: The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) mode...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Zheng-gang, Yang, Shu-qin, Lv, Cai-xia, An, Shu-yi, Wu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9251895/
https://www.ncbi.nlm.nih.gov/pubmed/35777884
http://dx.doi.org/10.1136/bmjopen-2021-056685
_version_ 1784740133944164352
author Fang, Zheng-gang
Yang, Shu-qin
Lv, Cai-xia
An, Shu-yi
Wu, Wei
author_facet Fang, Zheng-gang
Yang, Shu-qin
Lv, Cai-xia
An, Shu-yi
Wu, Wei
author_sort Fang, Zheng-gang
collection PubMed
description OBJECTIVE: The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA. DESIGN: Time-series study. SETTING: The USA was the setting for this study. MAIN OUTCOME MEASURES: Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models. RESULTS: In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model. CONCLUSIONS: The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.
format Online
Article
Text
id pubmed-9251895
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-92518952022-07-05 Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study Fang, Zheng-gang Yang, Shu-qin Lv, Cai-xia An, Shu-yi Wu, Wei BMJ Open Epidemiology OBJECTIVE: The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA. DESIGN: Time-series study. SETTING: The USA was the setting for this study. MAIN OUTCOME MEASURES: Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models. RESULTS: In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model. CONCLUSIONS: The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model. BMJ Publishing Group 2022-07-01 /pmc/articles/PMC9251895/ /pubmed/35777884 http://dx.doi.org/10.1136/bmjopen-2021-056685 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Epidemiology
Fang, Zheng-gang
Yang, Shu-qin
Lv, Cai-xia
An, Shu-yi
Wu, Wei
Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_full Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_fullStr Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_full_unstemmed Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_short Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_sort application of a data-driven xgboost model for the prediction of covid-19 in the usa: a time-series study
topic Epidemiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9251895/
https://www.ncbi.nlm.nih.gov/pubmed/35777884
http://dx.doi.org/10.1136/bmjopen-2021-056685
work_keys_str_mv AT fangzhenggang applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT yangshuqin applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT lvcaixia applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT anshuyi applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT wuwei applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy