Cargando…

Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study

OBJECTIVES: Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model...

Descripción completa

Detalles Bibliográficos
Autores principales: Alim, Mirxat, Ye, Guo-Hua, Guan, Peng, Huang, De-Sheng, Zhou, Bao-Sen, Wu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7722837/
https://www.ncbi.nlm.nih.gov/pubmed/33293308
http://dx.doi.org/10.1136/bmjopen-2020-039676
_version_ 1783620231841710080
author Alim, Mirxat
Ye, Guo-Hua
Guan, Peng
Huang, De-Sheng
Zhou, Bao-Sen
Wu, Wei
author_facet Alim, Mirxat
Ye, Guo-Hua
Guan, Peng
Huang, De-Sheng
Zhou, Bao-Sen
Wu, Wei
author_sort Alim, Mirxat
collection PubMed
description OBJECTIVES: Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more suitable for predicting the occurrence of brucellosis in mainland China. DESIGN: Time-series study. SETTING: Mainland China. METHODS: Data on human brucellosis in mainland China were provided by the National Health and Family Planning Commission of China. The data were divided into a training set and a test set. The training set was composed of the monthly incidence of human brucellosis in mainland China from January 2008 to June 2018, and the test set was composed of the monthly incidence from July 2018 to June 2019. The mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to evaluate the effects of model fitting and prediction. RESULTS: The number of human brucellosis patients in mainland China increased from 30 002 in 2008 to 40 328 in 2018. There was an increasing trend and obvious seasonal distribution in the original time series. For the training set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)(12) model were 338.867, 450.223 and 10.323, respectively, and the MAE, RSME and MAPE of the XGBoost model were 189.332, 262.458 and 4.475, respectively. For the test set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)(12) model were 529.406, 586.059 and 17.676, respectively, and the MAE, RSME and MAPE of the XGBoost model were 249.307, 280.645 and 7.643, respectively. CONCLUSIONS: The performance of the XGBoost model was better than that of the ARIMA model. The XGBoost model is more suitable for prediction cases of human brucellosis in mainland China.
format Online
Article
Text
id pubmed-7722837
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-77228372020-12-14 Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study Alim, Mirxat Ye, Guo-Hua Guan, Peng Huang, De-Sheng Zhou, Bao-Sen Wu, Wei BMJ Open Epidemiology OBJECTIVES: Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more suitable for predicting the occurrence of brucellosis in mainland China. DESIGN: Time-series study. SETTING: Mainland China. METHODS: Data on human brucellosis in mainland China were provided by the National Health and Family Planning Commission of China. The data were divided into a training set and a test set. The training set was composed of the monthly incidence of human brucellosis in mainland China from January 2008 to June 2018, and the test set was composed of the monthly incidence from July 2018 to June 2019. The mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to evaluate the effects of model fitting and prediction. RESULTS: The number of human brucellosis patients in mainland China increased from 30 002 in 2008 to 40 328 in 2018. There was an increasing trend and obvious seasonal distribution in the original time series. For the training set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)(12) model were 338.867, 450.223 and 10.323, respectively, and the MAE, RSME and MAPE of the XGBoost model were 189.332, 262.458 and 4.475, respectively. For the test set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)(12) model were 529.406, 586.059 and 17.676, respectively, and the MAE, RSME and MAPE of the XGBoost model were 249.307, 280.645 and 7.643, respectively. CONCLUSIONS: The performance of the XGBoost model was better than that of the ARIMA model. The XGBoost model is more suitable for prediction cases of human brucellosis in mainland China. BMJ Publishing Group 2020-12-07 /pmc/articles/PMC7722837/ /pubmed/33293308 http://dx.doi.org/10.1136/bmjopen-2020-039676 Text en © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. http://creativecommons.org/licenses/by-nc/4.0/ http://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Epidemiology
Alim, Mirxat
Ye, Guo-Hua
Guan, Peng
Huang, De-Sheng
Zhou, Bao-Sen
Wu, Wei
Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study
title Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study
title_full Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study
title_fullStr Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study
title_full_unstemmed Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study
title_short Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study
title_sort comparison of arima model and xgboost model for prediction of human brucellosis in mainland china: a time-series study
topic Epidemiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7722837/
https://www.ncbi.nlm.nih.gov/pubmed/33293308
http://dx.doi.org/10.1136/bmjopen-2020-039676
work_keys_str_mv AT alimmirxat comparisonofarimamodelandxgboostmodelforpredictionofhumanbrucellosisinmainlandchinaatimeseriesstudy
AT yeguohua comparisonofarimamodelandxgboostmodelforpredictionofhumanbrucellosisinmainlandchinaatimeseriesstudy
AT guanpeng comparisonofarimamodelandxgboostmodelforpredictionofhumanbrucellosisinmainlandchinaatimeseriesstudy
AT huangdesheng comparisonofarimamodelandxgboostmodelforpredictionofhumanbrucellosisinmainlandchinaatimeseriesstudy
AT zhoubaosen comparisonofarimamodelandxgboostmodelforpredictionofhumanbrucellosisinmainlandchinaatimeseriesstudy
AT wuwei comparisonofarimamodelandxgboostmodelforpredictionofhumanbrucellosisinmainlandchinaatimeseriesstudy