Cargando…

Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China

BACKGROUND: Infectious diarrhea can lead to a considerable global disease burden. Thus, the accurate prediction of an infectious diarrhea epidemic is crucial for public health authorities. This study was aimed at developing an optimal random forest (RF) model, considering meteorological factors used...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Xinyu, Liu, Wendong, Ai, Jing, He, Mike, Wu, Ying, Shi, Yingying, Shen, Wenqi, Bao, Changjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7071679/
https://www.ncbi.nlm.nih.gov/pubmed/32171261
http://dx.doi.org/10.1186/s12879-020-4930-2
_version_ 1783506257248780288
author Fang, Xinyu
Liu, Wendong
Ai, Jing
He, Mike
Wu, Ying
Shi, Yingying
Shen, Wenqi
Bao, Changjun
author_facet Fang, Xinyu
Liu, Wendong
Ai, Jing
He, Mike
Wu, Ying
Shi, Yingying
Shen, Wenqi
Bao, Changjun
author_sort Fang, Xinyu
collection PubMed
description BACKGROUND: Infectious diarrhea can lead to a considerable global disease burden. Thus, the accurate prediction of an infectious diarrhea epidemic is crucial for public health authorities. This study was aimed at developing an optimal random forest (RF) model, considering meteorological factors used to predict an incidence of infectious diarrhea in Jiangsu Province, China. METHODS: An RF model was developed and compared with classical autoregressive integrated moving average (ARIMA)/X models. Morbidity and meteorological data from 2012 to 2016 were used to construct the models and the data from 2017 were used for testing. RESULTS: The RF model considered atmospheric pressure, precipitation, relative humidity, and their lagged terms, as well as 1–4 week lag morbidity and time variable as the predictors. Meanwhile, a univariate model ARIMA (1,0,1)(1,0,0)(52) (AIC = − 575.92, BIC = − 558.14) and a multivariable model ARIMAX (1,0,1)(1,0,0)(52) with 0–1 week lag precipitation (AIC = − 578.58, BIC = − 578.13) were developed as benchmarks. The RF model outperformed the ARIMA/X models with a mean absolute percentage error (MAPE) of approximately 20%. The performance of the ARIMAX model was comparable to that of the ARIMA model with a MAPE reaching approximately 30%. CONCLUSIONS: The RF model fitted the dynamic nature of an infectious diarrhea epidemic well and delivered an ideal prediction accuracy. It comprehensively combined the synchronous and lagged effects of meteorological factors; it also integrated the autocorrelation and seasonality of the morbidity. The RF model can be used to predict the epidemic level and has a high potential for practical implementation.
format Online
Article
Text
id pubmed-7071679
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70716792020-03-18 Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China Fang, Xinyu Liu, Wendong Ai, Jing He, Mike Wu, Ying Shi, Yingying Shen, Wenqi Bao, Changjun BMC Infect Dis Research Article BACKGROUND: Infectious diarrhea can lead to a considerable global disease burden. Thus, the accurate prediction of an infectious diarrhea epidemic is crucial for public health authorities. This study was aimed at developing an optimal random forest (RF) model, considering meteorological factors used to predict an incidence of infectious diarrhea in Jiangsu Province, China. METHODS: An RF model was developed and compared with classical autoregressive integrated moving average (ARIMA)/X models. Morbidity and meteorological data from 2012 to 2016 were used to construct the models and the data from 2017 were used for testing. RESULTS: The RF model considered atmospheric pressure, precipitation, relative humidity, and their lagged terms, as well as 1–4 week lag morbidity and time variable as the predictors. Meanwhile, a univariate model ARIMA (1,0,1)(1,0,0)(52) (AIC = − 575.92, BIC = − 558.14) and a multivariable model ARIMAX (1,0,1)(1,0,0)(52) with 0–1 week lag precipitation (AIC = − 578.58, BIC = − 578.13) were developed as benchmarks. The RF model outperformed the ARIMA/X models with a mean absolute percentage error (MAPE) of approximately 20%. The performance of the ARIMAX model was comparable to that of the ARIMA model with a MAPE reaching approximately 30%. CONCLUSIONS: The RF model fitted the dynamic nature of an infectious diarrhea epidemic well and delivered an ideal prediction accuracy. It comprehensively combined the synchronous and lagged effects of meteorological factors; it also integrated the autocorrelation and seasonality of the morbidity. The RF model can be used to predict the epidemic level and has a high potential for practical implementation. BioMed Central 2020-03-14 /pmc/articles/PMC7071679/ /pubmed/32171261 http://dx.doi.org/10.1186/s12879-020-4930-2 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Fang, Xinyu
Liu, Wendong
Ai, Jing
He, Mike
Wu, Ying
Shi, Yingying
Shen, Wenqi
Bao, Changjun
Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China
title Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China
title_full Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China
title_fullStr Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China
title_full_unstemmed Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China
title_short Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China
title_sort forecasting incidence of infectious diarrhea using random forest in jiangsu province, china
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7071679/
https://www.ncbi.nlm.nih.gov/pubmed/32171261
http://dx.doi.org/10.1186/s12879-020-4930-2
work_keys_str_mv AT fangxinyu forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT liuwendong forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT aijing forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT hemike forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT wuying forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT shiyingying forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT shenwenqi forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina
AT baochangjun forecastingincidenceofinfectiousdiarrheausingrandomforestinjiangsuprovincechina