Cargando…

High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors

The China National Center for Food Safety Risk Assessment (CFSA) uses the Foodborne Disease Monitoring and Reporting System (FDMRS) to monitor outbreaks of foodborne diseases across the country. However, there are problems of underreporting or erroneous reporting in FDMRS, which significantly increa...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Peng, Cui, Wenjuan, Wang, Hanxue, Du, Yi, Zhou, Yuanchun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8390778/
https://www.ncbi.nlm.nih.gov/pubmed/33902323
http://dx.doi.org/10.1089/fpd.2020.2913
_version_ 1783743142907871232
author Zhang, Peng
Cui, Wenjuan
Wang, Hanxue
Du, Yi
Zhou, Yuanchun
author_facet Zhang, Peng
Cui, Wenjuan
Wang, Hanxue
Du, Yi
Zhou, Yuanchun
author_sort Zhang, Peng
collection PubMed
description The China National Center for Food Safety Risk Assessment (CFSA) uses the Foodborne Disease Monitoring and Reporting System (FDMRS) to monitor outbreaks of foodborne diseases across the country. However, there are problems of underreporting or erroneous reporting in FDMRS, which significantly increase the cost of related epidemic investigations. To solve this problem, we designed a model to identify suspected outbreaks from the data generated by the FDMRS of CFSA. In this study, machine learning models were used to fit the data. The recall rate and F1-score were used as evaluation metrics to compare the classification performance of each model. Feature importance and pathogenic factors were identified and analyzed using tree-based and gradient boosting models. Three real foodborne disease outbreaks were then used to evaluate the best performing model. Furthermore, the SHapley Additive exPlanation value was used to identify the effect of features. Among all machine learning classification models, the eXtreme Gradient Boosting (XGBoost) model achieved the best performance, with the highest recall rate and F1-score of 0.9699 and 0.9582, respectively. In terms of model validation, the model provides a correct judgment of real outbreaks. In the feature importance analysis with the XGBoost model, the health status of the other people with the same exposure has the highest weight, reaching 0.65. The machine learning model built in this study exhibits high accuracy in recognizing foodborne disease outbreaks, thus reducing the manual burden for medical staff. The model helped us identify the confounding factors of foodborne disease outbreaks. Attention should be paid not only to the health status of those with the same exposure but also to the similarity of the cases in time and space.
format Online
Article
Text
id pubmed-8390778
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-83907782021-09-01 High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors Zhang, Peng Cui, Wenjuan Wang, Hanxue Du, Yi Zhou, Yuanchun Foodborne Pathog Dis Original Articles The China National Center for Food Safety Risk Assessment (CFSA) uses the Foodborne Disease Monitoring and Reporting System (FDMRS) to monitor outbreaks of foodborne diseases across the country. However, there are problems of underreporting or erroneous reporting in FDMRS, which significantly increase the cost of related epidemic investigations. To solve this problem, we designed a model to identify suspected outbreaks from the data generated by the FDMRS of CFSA. In this study, machine learning models were used to fit the data. The recall rate and F1-score were used as evaluation metrics to compare the classification performance of each model. Feature importance and pathogenic factors were identified and analyzed using tree-based and gradient boosting models. Three real foodborne disease outbreaks were then used to evaluate the best performing model. Furthermore, the SHapley Additive exPlanation value was used to identify the effect of features. Among all machine learning classification models, the eXtreme Gradient Boosting (XGBoost) model achieved the best performance, with the highest recall rate and F1-score of 0.9699 and 0.9582, respectively. In terms of model validation, the model provides a correct judgment of real outbreaks. In the feature importance analysis with the XGBoost model, the health status of the other people with the same exposure has the highest weight, reaching 0.65. The machine learning model built in this study exhibits high accuracy in recognizing foodborne disease outbreaks, thus reducing the manual burden for medical staff. The model helped us identify the confounding factors of foodborne disease outbreaks. Attention should be paid not only to the health status of those with the same exposure but also to the similarity of the cases in time and space. Mary Ann Liebert, Inc., publishers 2021-08-01 2021-08-12 /pmc/articles/PMC8390778/ /pubmed/33902323 http://dx.doi.org/10.1089/fpd.2020.2913 Text en © Peng Zhang et al. 2021; Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by-nc/4.0/This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License [CC-BY-NC] (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are cited.
spellingShingle Original Articles
Zhang, Peng
Cui, Wenjuan
Wang, Hanxue
Du, Yi
Zhou, Yuanchun
High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors
title High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors
title_full High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors
title_fullStr High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors
title_full_unstemmed High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors
title_short High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors
title_sort high-efficiency machine learning method for identifying foodborne disease outbreaks and confounding factors
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8390778/
https://www.ncbi.nlm.nih.gov/pubmed/33902323
http://dx.doi.org/10.1089/fpd.2020.2913
work_keys_str_mv AT zhangpeng highefficiencymachinelearningmethodforidentifyingfoodbornediseaseoutbreaksandconfoundingfactors
AT cuiwenjuan highefficiencymachinelearningmethodforidentifyingfoodbornediseaseoutbreaksandconfoundingfactors
AT wanghanxue highefficiencymachinelearningmethodforidentifyingfoodbornediseaseoutbreaksandconfoundingfactors
AT duyi highefficiencymachinelearningmethodforidentifyingfoodbornediseaseoutbreaksandconfoundingfactors
AT zhouyuanchun highefficiencymachinelearningmethodforidentifyingfoodbornediseaseoutbreaksandconfoundingfactors