Cargando…

A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time

Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dixon, Samuel, Keshavamurthy, Ravikiran, Farber, Daniel H., Stevens, Andrew, Pazdernik, Karl T., Charles, Lauren E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8875569/ https://www.ncbi.nlm.nih.gov/pubmed/35215129 http://dx.doi.org/10.3390/pathogens11020185

_version_	1784657949738663936
author	Dixon, Samuel Keshavamurthy, Ravikiran Farber, Daniel H. Stevens, Andrew Pazdernik, Karl T. Charles, Lauren E.
author_facet	Dixon, Samuel Keshavamurthy, Ravikiran Farber, Daniel H. Stevens, Andrew Pazdernik, Karl T. Charles, Lauren E.
author_sort	Dixon, Samuel
collection	PubMed
description	Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time intervals. We forecasted three diverse diseases: campylobacteriosis, typhoid, and Q-fever, using a wide variety of features (n = 46) from public datasets, e.g., landscape, climate, and socioeconomic factors. We compared autoregressive statistical models to two tree-based ML models (extreme gradient boosted trees [XGB] and random forest [RF]) and two DL models (multi-layer perceptron and encoder–decoder model). The disease models were trained on data from seven different countries at the region-level between 2009–2017. Forecasting performance of all models was assessed using mean absolute error, root mean square error, and Poisson deviance across Australia, Israel, and the United States for the months of January through August of 2018. The overall model results were compared across diseases as well as various data splits, including country, regions with highest and lowest cases, and the forecasted months out (i.e., nowcasting, short-term, and long-term forecasting). Overall, the XGB models performed the best for all diseases and, in general, tree-based ML models performed the best when looking at data splits. There were a few instances where the statistical or DL models had minutely smaller error metrics for specific subsets of typhoid, which is a disease with very low case counts. Feature importance per disease was measured by using four tree-based ML models (i.e., XGB and RF with and without region name as a feature). The most important feature groups included previous case counts, region name, population counts and density, mortality causes of neonatal to under 5 years of age, sanitation factors, and elevation. This study demonstrates the power of ML approaches to incorporate a wide range of factors to forecast various diseases, regardless of location, more accurately than traditional statistical approaches.
format	Online Article Text
id	pubmed-8875569
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-88755692022-02-26 A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time Dixon, Samuel Keshavamurthy, Ravikiran Farber, Daniel H. Stevens, Andrew Pazdernik, Karl T. Charles, Lauren E. Pathogens Article Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time intervals. We forecasted three diverse diseases: campylobacteriosis, typhoid, and Q-fever, using a wide variety of features (n = 46) from public datasets, e.g., landscape, climate, and socioeconomic factors. We compared autoregressive statistical models to two tree-based ML models (extreme gradient boosted trees [XGB] and random forest [RF]) and two DL models (multi-layer perceptron and encoder–decoder model). The disease models were trained on data from seven different countries at the region-level between 2009–2017. Forecasting performance of all models was assessed using mean absolute error, root mean square error, and Poisson deviance across Australia, Israel, and the United States for the months of January through August of 2018. The overall model results were compared across diseases as well as various data splits, including country, regions with highest and lowest cases, and the forecasted months out (i.e., nowcasting, short-term, and long-term forecasting). Overall, the XGB models performed the best for all diseases and, in general, tree-based ML models performed the best when looking at data splits. There were a few instances where the statistical or DL models had minutely smaller error metrics for specific subsets of typhoid, which is a disease with very low case counts. Feature importance per disease was measured by using four tree-based ML models (i.e., XGB and RF with and without region name as a feature). The most important feature groups included previous case counts, region name, population counts and density, mortality causes of neonatal to under 5 years of age, sanitation factors, and elevation. This study demonstrates the power of ML approaches to incorporate a wide range of factors to forecast various diseases, regardless of location, more accurately than traditional statistical approaches. MDPI 2022-01-29 /pmc/articles/PMC8875569/ /pubmed/35215129 http://dx.doi.org/10.3390/pathogens11020185 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Dixon, Samuel Keshavamurthy, Ravikiran Farber, Daniel H. Stevens, Andrew Pazdernik, Karl T. Charles, Lauren E. A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_full	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_fullStr	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_full_unstemmed	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_short	A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time
title_sort	comparison of infectious disease forecasting methods across locations, diseases, and time
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8875569/ https://www.ncbi.nlm.nih.gov/pubmed/35215129 http://dx.doi.org/10.3390/pathogens11020185
work_keys_str_mv	AT dixonsamuel acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT keshavamurthyravikiran acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT farberdanielh acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT stevensandrew acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT pazdernikkarlt acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT charleslaurene acomparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT dixonsamuel comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT keshavamurthyravikiran comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT farberdanielh comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT stevensandrew comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT pazdernikkarlt comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime AT charleslaurene comparisonofinfectiousdiseaseforecastingmethodsacrosslocationsdiseasesandtime

A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time

Ejemplares similares