Cargando…

Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model

BACKGROUND: Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in ad...

Descripción completa

Detalles Bibliográficos
Autores principales: Lv, Cai-Xia, An, Shu-Yi, Qiao, Bao-Jun, Wu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8377883/
https://www.ncbi.nlm.nih.gov/pubmed/34412581
http://dx.doi.org/10.1186/s12879-021-06503-y
_version_ 1783740728640274432
author Lv, Cai-Xia
An, Shu-Yi
Qiao, Bao-Jun
Wu, Wei
author_facet Lv, Cai-Xia
An, Shu-Yi
Qiao, Bao-Jun
Wu, Wei
author_sort Lv, Cai-Xia
collection PubMed
description BACKGROUND: Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in advance. In this study, we propose a multistep prediction strategy based on extreme gradient boosting (XGBoost) for HFRS as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of the XGBoost model will be compared with the autoregressive integrated moving average (ARIMA) model by different evaluation indicators. METHODS: We collected HFRS incidence data from 2004 to 2018 of mainland China. The data from 2004 to 2017 were divided into training sets to establish the seasonal ARIMA model and XGBoost model, while the 2018 data were used to test the prediction performance. In the multistep XGBoost forecasting model, one-hot encoding was used to handle seasonal features. Furthermore, a series of evaluation indices were performed to evaluate the accuracy of the multistep forecast XGBoost model. RESULTS: There were 200,237 HFRS cases in China from 2004 to 2018. A long-term downward trend and bimodal seasonality were identified in the original time series. According to the minimum corrected akaike information criterion (CAIC) value, the optimal ARIMA (3, 1, 0) × (1, 1, 0)(12) model is selected. The index ME, RMSE, MAE, MPE, MAPE, and MASE indices of the XGBoost model were higher than those of the ARIMA model in the fitting part, whereas the RMSE of the XGBoost model was lower. The prediction performance evaluation indicators (MAE, MPE, MAPE, RMSE and MASE) of the one-step prediction and multistep prediction XGBoost model were all notably lower than those of the ARIMA model. CONCLUSIONS: The multistep XGBoost prediction model showed a much better prediction accuracy and model stability than the multistep ARIMA prediction model. The XGBoost model performed better in predicting complicated and nonlinear data like HFRS. Additionally, Multistep prediction models are more practical than one-step prediction models in forecasting infectious diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-021-06503-y.
format Online
Article
Text
id pubmed-8377883
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83778832021-08-23 Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model Lv, Cai-Xia An, Shu-Yi Qiao, Bao-Jun Wu, Wei BMC Infect Dis Research Article BACKGROUND: Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in advance. In this study, we propose a multistep prediction strategy based on extreme gradient boosting (XGBoost) for HFRS as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of the XGBoost model will be compared with the autoregressive integrated moving average (ARIMA) model by different evaluation indicators. METHODS: We collected HFRS incidence data from 2004 to 2018 of mainland China. The data from 2004 to 2017 were divided into training sets to establish the seasonal ARIMA model and XGBoost model, while the 2018 data were used to test the prediction performance. In the multistep XGBoost forecasting model, one-hot encoding was used to handle seasonal features. Furthermore, a series of evaluation indices were performed to evaluate the accuracy of the multistep forecast XGBoost model. RESULTS: There were 200,237 HFRS cases in China from 2004 to 2018. A long-term downward trend and bimodal seasonality were identified in the original time series. According to the minimum corrected akaike information criterion (CAIC) value, the optimal ARIMA (3, 1, 0) × (1, 1, 0)(12) model is selected. The index ME, RMSE, MAE, MPE, MAPE, and MASE indices of the XGBoost model were higher than those of the ARIMA model in the fitting part, whereas the RMSE of the XGBoost model was lower. The prediction performance evaluation indicators (MAE, MPE, MAPE, RMSE and MASE) of the one-step prediction and multistep prediction XGBoost model were all notably lower than those of the ARIMA model. CONCLUSIONS: The multistep XGBoost prediction model showed a much better prediction accuracy and model stability than the multistep ARIMA prediction model. The XGBoost model performed better in predicting complicated and nonlinear data like HFRS. Additionally, Multistep prediction models are more practical than one-step prediction models in forecasting infectious diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-021-06503-y. BioMed Central 2021-08-19 /pmc/articles/PMC8377883/ /pubmed/34412581 http://dx.doi.org/10.1186/s12879-021-06503-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Lv, Cai-Xia
An, Shu-Yi
Qiao, Bao-Jun
Wu, Wei
Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model
title Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model
title_full Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model
title_fullStr Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model
title_full_unstemmed Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model
title_short Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model
title_sort time series analysis of hemorrhagic fever with renal syndrome in mainland china by using an xgboost forecasting model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8377883/
https://www.ncbi.nlm.nih.gov/pubmed/34412581
http://dx.doi.org/10.1186/s12879-021-06503-y
work_keys_str_mv AT lvcaixia timeseriesanalysisofhemorrhagicfeverwithrenalsyndromeinmainlandchinabyusinganxgboostforecastingmodel
AT anshuyi timeseriesanalysisofhemorrhagicfeverwithrenalsyndromeinmainlandchinabyusinganxgboostforecastingmodel
AT qiaobaojun timeseriesanalysisofhemorrhagicfeverwithrenalsyndromeinmainlandchinabyusinganxgboostforecastingmodel
AT wuwei timeseriesanalysisofhemorrhagicfeverwithrenalsyndromeinmainlandchinabyusinganxgboostforecastingmodel