Cargando…

A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China

BACKGROUND: Tuberculosis (TB) is the respiratory infectious disease with the highest incidence in China. We aim to design a series of forecasting models and find the factors that affect the incidence of TB, thereby improving the accuracy of the incidence prediction. RESULTS: In this paper, we develo...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Enbin, Zhang, Hao, Guo, Xinsheng, Zang, Zinan, Liu, Zhen, Liu, Yuanning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128107/
https://www.ncbi.nlm.nih.gov/pubmed/35606725
http://dx.doi.org/10.1186/s12879-022-07462-8
_version_ 1784712492478365696
author Yang, Enbin
Zhang, Hao
Guo, Xinsheng
Zang, Zinan
Liu, Zhen
Liu, Yuanning
author_facet Yang, Enbin
Zhang, Hao
Guo, Xinsheng
Zang, Zinan
Liu, Zhen
Liu, Yuanning
author_sort Yang, Enbin
collection PubMed
description BACKGROUND: Tuberculosis (TB) is the respiratory infectious disease with the highest incidence in China. We aim to design a series of forecasting models and find the factors that affect the incidence of TB, thereby improving the accuracy of the incidence prediction. RESULTS: In this paper, we developed a new interpretable prediction system based on the multivariate multi-step Long Short-Term Memory (LSTM) model and SHapley Additive exPlanation (SHAP) method. Four accuracy measures are introduced into the system: Root Mean Square Error, Mean Absolute Error, Mean Absolute Percentage Error, and symmetric Mean Absolute Percentage Error. The Autoregressive Integrated Moving Average (ARIMA) model and seasonal ARIMA model are established. The multi-step ARIMA–LSTM model is proposed for the first time to examine the performance of each model in the short, medium, and long term, respectively. Compared with the ARIMA model, each error of the multivariate 2-step LSTM model is reduced by 12.92%, 15.94%, 15.97%, and 14.81% in the short term. The 3-step ARIMA–LSTM model achieved excellent performance, with each error decreased to 15.19%, 33.14%, 36.79%, and 29.76% in the medium and long term. We provide the local and global explanation of the multivariate single-step LSTM model in the field of incidence prediction, pioneering. CONCLUSIONS: The multivariate 2-step LSTM model is suitable for short-term prediction and obtained a similar performance as previous studies. The 3-step ARIMA–LSTM model is appropriate for medium-to-long-term prediction and outperforms these models. The SHAP results indicate that the five most crucial features are maximum temperature, average relative humidity, local financial budget, monthly sunshine percentage, and sunshine hours. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-022-07462-8.
format Online
Article
Text
id pubmed-9128107
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91281072022-05-25 A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China Yang, Enbin Zhang, Hao Guo, Xinsheng Zang, Zinan Liu, Zhen Liu, Yuanning BMC Infect Dis Research BACKGROUND: Tuberculosis (TB) is the respiratory infectious disease with the highest incidence in China. We aim to design a series of forecasting models and find the factors that affect the incidence of TB, thereby improving the accuracy of the incidence prediction. RESULTS: In this paper, we developed a new interpretable prediction system based on the multivariate multi-step Long Short-Term Memory (LSTM) model and SHapley Additive exPlanation (SHAP) method. Four accuracy measures are introduced into the system: Root Mean Square Error, Mean Absolute Error, Mean Absolute Percentage Error, and symmetric Mean Absolute Percentage Error. The Autoregressive Integrated Moving Average (ARIMA) model and seasonal ARIMA model are established. The multi-step ARIMA–LSTM model is proposed for the first time to examine the performance of each model in the short, medium, and long term, respectively. Compared with the ARIMA model, each error of the multivariate 2-step LSTM model is reduced by 12.92%, 15.94%, 15.97%, and 14.81% in the short term. The 3-step ARIMA–LSTM model achieved excellent performance, with each error decreased to 15.19%, 33.14%, 36.79%, and 29.76% in the medium and long term. We provide the local and global explanation of the multivariate single-step LSTM model in the field of incidence prediction, pioneering. CONCLUSIONS: The multivariate 2-step LSTM model is suitable for short-term prediction and obtained a similar performance as previous studies. The 3-step ARIMA–LSTM model is appropriate for medium-to-long-term prediction and outperforms these models. The SHAP results indicate that the five most crucial features are maximum temperature, average relative humidity, local financial budget, monthly sunshine percentage, and sunshine hours. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-022-07462-8. BioMed Central 2022-05-23 /pmc/articles/PMC9128107/ /pubmed/35606725 http://dx.doi.org/10.1186/s12879-022-07462-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yang, Enbin
Zhang, Hao
Guo, Xinsheng
Zang, Zinan
Liu, Zhen
Liu, Yuanning
A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
title A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
title_full A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
title_fullStr A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
title_full_unstemmed A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
title_short A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
title_sort multivariate multi-step lstm forecasting model for tuberculosis incidence with model explanation in liaoning province, china
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128107/
https://www.ncbi.nlm.nih.gov/pubmed/35606725
http://dx.doi.org/10.1186/s12879-022-07462-8
work_keys_str_mv AT yangenbin amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT zhanghao amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT guoxinsheng amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT zangzinan amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT liuzhen amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT liuyuanning amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT yangenbin multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT zhanghao multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT guoxinsheng multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT zangzinan multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT liuzhen multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina
AT liuyuanning multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina