Cargando…

Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium

Prevention of the emergence and spread of foodborne diseases is an important prerequisite for the improvement of public health. Source attribution models link sporadic human cases of a specific illness to food sources and animal reservoirs. With the next generation sequencing technology, it is possi...

Descripción completa

Detalles Bibliográficos
Autores principales: Munck, Nanna, Njage, Patrick Murigu Kamau, Leekitcharoenphon, Pimlapas, Litrup, Eva, Hald, Tine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7540586/
https://www.ncbi.nlm.nih.gov/pubmed/32515055
http://dx.doi.org/10.1111/risa.13510
_version_ 1783591242818387968
author Munck, Nanna
Njage, Patrick Murigu Kamau
Leekitcharoenphon, Pimlapas
Litrup, Eva
Hald, Tine
author_facet Munck, Nanna
Njage, Patrick Murigu Kamau
Leekitcharoenphon, Pimlapas
Litrup, Eva
Hald, Tine
author_sort Munck, Nanna
collection PubMed
description Prevention of the emergence and spread of foodborne diseases is an important prerequisite for the improvement of public health. Source attribution models link sporadic human cases of a specific illness to food sources and animal reservoirs. With the next generation sequencing technology, it is possible to develop novel source attribution models. We investigated the potential of machine learning to predict the animal reservoir from which a bacterial strain isolated from a human salmonellosis case originated based on whole‐genome sequencing. Machine learning methods recognize patterns in large and complex data sets and use this knowledge to build models. The model learns patterns associated with genetic variations in bacteria isolated from the different animal reservoirs. We selected different machine learning algorithms to predict sources of human salmonellosis cases and trained the model with Danish Salmonella Typhimurium isolates sampled from broilers (n = 34), cattle (n = 2), ducks (n = 11), layers (n = 4), and pigs (n = 159). Using cgMLST as input features, the model yielded an average accuracy of 0.783 (95% CI: 0.77–0.80) in the source prediction for the random forest and 0.933 (95% CI: 0.92–0.94) for the logit boost algorithm. Logit boost algorithm was most accurate (valid accuracy: 92%, CI: 0.8706–0.9579) and predicted the origin of 81% of the domestic sporadic human salmonellosis cases. The most important source was Danish produced pigs (53%) followed by imported pigs (16%), imported broilers (6%), imported ducks (2%), Danish produced layers (2%), Danish produced cattle and imported cattle (<1%) while 18% was not predicted. Machine learning has potential for improving source attribution modeling based on sequence data. Results of such models can inform risk managers to identify and prioritize food safety interventions.
format Online
Article
Text
id pubmed-7540586
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-75405862020-10-15 Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium Munck, Nanna Njage, Patrick Murigu Kamau Leekitcharoenphon, Pimlapas Litrup, Eva Hald, Tine Risk Anal Original Research Articles Prevention of the emergence and spread of foodborne diseases is an important prerequisite for the improvement of public health. Source attribution models link sporadic human cases of a specific illness to food sources and animal reservoirs. With the next generation sequencing technology, it is possible to develop novel source attribution models. We investigated the potential of machine learning to predict the animal reservoir from which a bacterial strain isolated from a human salmonellosis case originated based on whole‐genome sequencing. Machine learning methods recognize patterns in large and complex data sets and use this knowledge to build models. The model learns patterns associated with genetic variations in bacteria isolated from the different animal reservoirs. We selected different machine learning algorithms to predict sources of human salmonellosis cases and trained the model with Danish Salmonella Typhimurium isolates sampled from broilers (n = 34), cattle (n = 2), ducks (n = 11), layers (n = 4), and pigs (n = 159). Using cgMLST as input features, the model yielded an average accuracy of 0.783 (95% CI: 0.77–0.80) in the source prediction for the random forest and 0.933 (95% CI: 0.92–0.94) for the logit boost algorithm. Logit boost algorithm was most accurate (valid accuracy: 92%, CI: 0.8706–0.9579) and predicted the origin of 81% of the domestic sporadic human salmonellosis cases. The most important source was Danish produced pigs (53%) followed by imported pigs (16%), imported broilers (6%), imported ducks (2%), Danish produced layers (2%), Danish produced cattle and imported cattle (<1%) while 18% was not predicted. Machine learning has potential for improving source attribution modeling based on sequence data. Results of such models can inform risk managers to identify and prioritize food safety interventions. John Wiley and Sons Inc. 2020-06-08 2020-09 /pmc/articles/PMC7540586/ /pubmed/32515055 http://dx.doi.org/10.1111/risa.13510 Text en © 2020 The Authors. Risk Analysis published by Wiley Periodicals LLC on behalf of Society for Risk Analysis This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Original Research Articles
Munck, Nanna
Njage, Patrick Murigu Kamau
Leekitcharoenphon, Pimlapas
Litrup, Eva
Hald, Tine
Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
title Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
title_full Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
title_fullStr Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
title_full_unstemmed Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
title_short Application of Whole‐Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium
title_sort application of whole‐genome sequences and machine learning in source attribution of salmonella typhimurium
topic Original Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7540586/
https://www.ncbi.nlm.nih.gov/pubmed/32515055
http://dx.doi.org/10.1111/risa.13510
work_keys_str_mv AT muncknanna applicationofwholegenomesequencesandmachinelearninginsourceattributionofsalmonellatyphimurium
AT njagepatrickmurigukamau applicationofwholegenomesequencesandmachinelearninginsourceattributionofsalmonellatyphimurium
AT leekitcharoenphonpimlapas applicationofwholegenomesequencesandmachinelearninginsourceattributionofsalmonellatyphimurium
AT litrupeva applicationofwholegenomesequencesandmachinelearninginsourceattributionofsalmonellatyphimurium
AT haldtine applicationofwholegenomesequencesandmachinelearninginsourceattributionofsalmonellatyphimurium