Cargando…

Prediction of hepatitis E using machine learning models

BACKGROUND: Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Yanhui, Feng, Yi, Qu, Fuli, Zhang, Li, Yan, Bingyu, Lv, Jingjing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7497991/ https://www.ncbi.nlm.nih.gov/pubmed/32941452 http://dx.doi.org/10.1371/journal.pone.0237750

_version_	1783583414810574848
author	Guo, Yanhui Feng, Yi Qu, Fuli Zhang, Li Yan, Bingyu Lv, Jingjing
author_facet	Guo, Yanhui Feng, Yi Qu, Fuli Zhang, Li Yan, Bingyu Lv, Jingjing
author_sort	Guo, Yanhui
collection	PubMed
description	BACKGROUND: Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared. METHODS: Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE). RESULTS: By analyzing data, we took ARIMA(1, 1, 1), ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficient C and kernel function parameter g were set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively. CONCLUSIONS: Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.
format	Online Article Text
id	pubmed-7497991
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-74979912020-09-24 Prediction of hepatitis E using machine learning models Guo, Yanhui Feng, Yi Qu, Fuli Zhang, Li Yan, Bingyu Lv, Jingjing PLoS One Research Article BACKGROUND: Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared. METHODS: Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE). RESULTS: By analyzing data, we took ARIMA(1, 1, 1), ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficient C and kernel function parameter g were set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively. CONCLUSIONS: Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number. Public Library of Science 2020-09-17 /pmc/articles/PMC7497991/ /pubmed/32941452 http://dx.doi.org/10.1371/journal.pone.0237750 Text en © 2020 Guo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Guo, Yanhui Feng, Yi Qu, Fuli Zhang, Li Yan, Bingyu Lv, Jingjing Prediction of hepatitis E using machine learning models
title	Prediction of hepatitis E using machine learning models
title_full	Prediction of hepatitis E using machine learning models
title_fullStr	Prediction of hepatitis E using machine learning models
title_full_unstemmed	Prediction of hepatitis E using machine learning models
title_short	Prediction of hepatitis E using machine learning models
title_sort	prediction of hepatitis e using machine learning models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7497991/ https://www.ncbi.nlm.nih.gov/pubmed/32941452 http://dx.doi.org/10.1371/journal.pone.0237750
work_keys_str_mv	AT guoyanhui predictionofhepatitiseusingmachinelearningmodels AT fengyi predictionofhepatitiseusingmachinelearningmodels AT qufuli predictionofhepatitiseusingmachinelearningmodels AT zhangli predictionofhepatitiseusingmachinelearningmodels AT yanbingyu predictionofhepatitiseusingmachinelearningmodels AT lvjingjing predictionofhepatitiseusingmachinelearningmodels

Prediction of hepatitis E using machine learning models

Ejemplares similares