Cargando…
A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China
BACKGROUND: Tuberculosis (TB) is the respiratory infectious disease with the highest incidence in China. We aim to design a series of forecasting models and find the factors that affect the incidence of TB, thereby improving the accuracy of the incidence prediction. RESULTS: In this paper, we develo...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128107/ https://www.ncbi.nlm.nih.gov/pubmed/35606725 http://dx.doi.org/10.1186/s12879-022-07462-8 |
_version_ | 1784712492478365696 |
---|---|
author | Yang, Enbin Zhang, Hao Guo, Xinsheng Zang, Zinan Liu, Zhen Liu, Yuanning |
author_facet | Yang, Enbin Zhang, Hao Guo, Xinsheng Zang, Zinan Liu, Zhen Liu, Yuanning |
author_sort | Yang, Enbin |
collection | PubMed |
description | BACKGROUND: Tuberculosis (TB) is the respiratory infectious disease with the highest incidence in China. We aim to design a series of forecasting models and find the factors that affect the incidence of TB, thereby improving the accuracy of the incidence prediction. RESULTS: In this paper, we developed a new interpretable prediction system based on the multivariate multi-step Long Short-Term Memory (LSTM) model and SHapley Additive exPlanation (SHAP) method. Four accuracy measures are introduced into the system: Root Mean Square Error, Mean Absolute Error, Mean Absolute Percentage Error, and symmetric Mean Absolute Percentage Error. The Autoregressive Integrated Moving Average (ARIMA) model and seasonal ARIMA model are established. The multi-step ARIMA–LSTM model is proposed for the first time to examine the performance of each model in the short, medium, and long term, respectively. Compared with the ARIMA model, each error of the multivariate 2-step LSTM model is reduced by 12.92%, 15.94%, 15.97%, and 14.81% in the short term. The 3-step ARIMA–LSTM model achieved excellent performance, with each error decreased to 15.19%, 33.14%, 36.79%, and 29.76% in the medium and long term. We provide the local and global explanation of the multivariate single-step LSTM model in the field of incidence prediction, pioneering. CONCLUSIONS: The multivariate 2-step LSTM model is suitable for short-term prediction and obtained a similar performance as previous studies. The 3-step ARIMA–LSTM model is appropriate for medium-to-long-term prediction and outperforms these models. The SHAP results indicate that the five most crucial features are maximum temperature, average relative humidity, local financial budget, monthly sunshine percentage, and sunshine hours. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-022-07462-8. |
format | Online Article Text |
id | pubmed-9128107 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91281072022-05-25 A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China Yang, Enbin Zhang, Hao Guo, Xinsheng Zang, Zinan Liu, Zhen Liu, Yuanning BMC Infect Dis Research BACKGROUND: Tuberculosis (TB) is the respiratory infectious disease with the highest incidence in China. We aim to design a series of forecasting models and find the factors that affect the incidence of TB, thereby improving the accuracy of the incidence prediction. RESULTS: In this paper, we developed a new interpretable prediction system based on the multivariate multi-step Long Short-Term Memory (LSTM) model and SHapley Additive exPlanation (SHAP) method. Four accuracy measures are introduced into the system: Root Mean Square Error, Mean Absolute Error, Mean Absolute Percentage Error, and symmetric Mean Absolute Percentage Error. The Autoregressive Integrated Moving Average (ARIMA) model and seasonal ARIMA model are established. The multi-step ARIMA–LSTM model is proposed for the first time to examine the performance of each model in the short, medium, and long term, respectively. Compared with the ARIMA model, each error of the multivariate 2-step LSTM model is reduced by 12.92%, 15.94%, 15.97%, and 14.81% in the short term. The 3-step ARIMA–LSTM model achieved excellent performance, with each error decreased to 15.19%, 33.14%, 36.79%, and 29.76% in the medium and long term. We provide the local and global explanation of the multivariate single-step LSTM model in the field of incidence prediction, pioneering. CONCLUSIONS: The multivariate 2-step LSTM model is suitable for short-term prediction and obtained a similar performance as previous studies. The 3-step ARIMA–LSTM model is appropriate for medium-to-long-term prediction and outperforms these models. The SHAP results indicate that the five most crucial features are maximum temperature, average relative humidity, local financial budget, monthly sunshine percentage, and sunshine hours. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-022-07462-8. BioMed Central 2022-05-23 /pmc/articles/PMC9128107/ /pubmed/35606725 http://dx.doi.org/10.1186/s12879-022-07462-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Yang, Enbin Zhang, Hao Guo, Xinsheng Zang, Zinan Liu, Zhen Liu, Yuanning A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China |
title | A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China |
title_full | A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China |
title_fullStr | A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China |
title_full_unstemmed | A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China |
title_short | A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China |
title_sort | multivariate multi-step lstm forecasting model for tuberculosis incidence with model explanation in liaoning province, china |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128107/ https://www.ncbi.nlm.nih.gov/pubmed/35606725 http://dx.doi.org/10.1186/s12879-022-07462-8 |
work_keys_str_mv | AT yangenbin amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT zhanghao amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT guoxinsheng amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT zangzinan amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT liuzhen amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT liuyuanning amultivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT yangenbin multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT zhanghao multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT guoxinsheng multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT zangzinan multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT liuzhen multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina AT liuyuanning multivariatemultisteplstmforecastingmodelfortuberculosisincidencewithmodelexplanationinliaoningprovincechina |