Cargando…

Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning

Salmonella enterica serovar Enteritidis is one of the most frequent causes of Salmonellosis globally and is commonly transmitted from animals to humans by the consumption of contaminated foodstuffs. In the UK and many other countries in the Global North, a significant proportion of cases are caused...

Descripción completa

Detalles Bibliográficos
Autores principales: Bayliss, Sion C, Locke, Rebecca K, Jenkins, Claire, Chattaway, Marie Anne, Dallman, Timothy J, Cowley, Lauren A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147375/
https://www.ncbi.nlm.nih.gov/pubmed/37042517
http://dx.doi.org/10.7554/eLife.84167
_version_ 1785034787469131776
author Bayliss, Sion C
Locke, Rebecca K
Jenkins, Claire
Chattaway, Marie Anne
Dallman, Timothy J
Cowley, Lauren A
author_facet Bayliss, Sion C
Locke, Rebecca K
Jenkins, Claire
Chattaway, Marie Anne
Dallman, Timothy J
Cowley, Lauren A
author_sort Bayliss, Sion C
collection PubMed
description Salmonella enterica serovar Enteritidis is one of the most frequent causes of Salmonellosis globally and is commonly transmitted from animals to humans by the consumption of contaminated foodstuffs. In the UK and many other countries in the Global North, a significant proportion of cases are caused by the consumption of imported food products or contracted during foreign travel, therefore, making the rapid identification of the geographical source of new infections a requirement for robust public health outbreak investigations. Herein, we detail the development and application of a hierarchical machine learning model to rapidly identify and trace the geographical source of S. Enteritidis infections from whole genome sequencing data. 2313 S. Enteritidis genomes, collected by the UKHSA between 2014–2019, were used to train a ‘local classifier per node’ hierarchical classifier to attribute isolates to four continents, 11 sub-regions, and 38 countries (53 classes). The highest classification accuracy was achieved at the continental level followed by the sub-regional and country levels (macro F1: 0.954, 0.718, 0.661, respectively). A number of countries commonly visited by UK travelers were predicted with high accuracy (hF1: >0.9). Longitudinal analysis and validation with publicly accessible international samples indicated that predictions were robust to prospective external datasets. The hierarchical machine learning framework provided granular geographical source prediction directly from sequencing reads in <4 min per sample, facilitating rapid outbreak resolution and real-time genomic epidemiology. The results suggest additional application to a broader range of pathogens and other geographically structured problems, such as antimicrobial resistance prediction, is warranted.
format Online
Article
Text
id pubmed-10147375
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-101473752023-04-29 Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning Bayliss, Sion C Locke, Rebecca K Jenkins, Claire Chattaway, Marie Anne Dallman, Timothy J Cowley, Lauren A eLife Epidemiology and Global Health Salmonella enterica serovar Enteritidis is one of the most frequent causes of Salmonellosis globally and is commonly transmitted from animals to humans by the consumption of contaminated foodstuffs. In the UK and many other countries in the Global North, a significant proportion of cases are caused by the consumption of imported food products or contracted during foreign travel, therefore, making the rapid identification of the geographical source of new infections a requirement for robust public health outbreak investigations. Herein, we detail the development and application of a hierarchical machine learning model to rapidly identify and trace the geographical source of S. Enteritidis infections from whole genome sequencing data. 2313 S. Enteritidis genomes, collected by the UKHSA between 2014–2019, were used to train a ‘local classifier per node’ hierarchical classifier to attribute isolates to four continents, 11 sub-regions, and 38 countries (53 classes). The highest classification accuracy was achieved at the continental level followed by the sub-regional and country levels (macro F1: 0.954, 0.718, 0.661, respectively). A number of countries commonly visited by UK travelers were predicted with high accuracy (hF1: >0.9). Longitudinal analysis and validation with publicly accessible international samples indicated that predictions were robust to prospective external datasets. The hierarchical machine learning framework provided granular geographical source prediction directly from sequencing reads in <4 min per sample, facilitating rapid outbreak resolution and real-time genomic epidemiology. The results suggest additional application to a broader range of pathogens and other geographically structured problems, such as antimicrobial resistance prediction, is warranted. eLife Sciences Publications, Ltd 2023-04-12 /pmc/articles/PMC10147375/ /pubmed/37042517 http://dx.doi.org/10.7554/eLife.84167 Text en © 2023, Bayliss et al https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Epidemiology and Global Health
Bayliss, Sion C
Locke, Rebecca K
Jenkins, Claire
Chattaway, Marie Anne
Dallman, Timothy J
Cowley, Lauren A
Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
title Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
title_full Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
title_fullStr Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
title_full_unstemmed Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
title_short Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
title_sort rapid geographical source attribution of salmonella enterica serovar enteritidis genomes using hierarchical machine learning
topic Epidemiology and Global Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147375/
https://www.ncbi.nlm.nih.gov/pubmed/37042517
http://dx.doi.org/10.7554/eLife.84167
work_keys_str_mv AT baylisssionc rapidgeographicalsourceattributionofsalmonellaentericaserovarenteritidisgenomesusinghierarchicalmachinelearning
AT lockerebeccak rapidgeographicalsourceattributionofsalmonellaentericaserovarenteritidisgenomesusinghierarchicalmachinelearning
AT jenkinsclaire rapidgeographicalsourceattributionofsalmonellaentericaserovarenteritidisgenomesusinghierarchicalmachinelearning
AT chattawaymarieanne rapidgeographicalsourceattributionofsalmonellaentericaserovarenteritidisgenomesusinghierarchicalmachinelearning
AT dallmantimothyj rapidgeographicalsourceattributionofsalmonellaentericaserovarenteritidisgenomesusinghierarchicalmachinelearning
AT cowleylaurena rapidgeographicalsourceattributionofsalmonellaentericaserovarenteritidisgenomesusinghierarchicalmachinelearning