Cargando…

Machine learning to predict the source of campylobacteriosis using whole genome data

Campylobacteriosis is among the world’s most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poult...

Descripción completa

Detalles Bibliográficos
Autores principales: Arning, Nicolas, Sheppard, Samuel K., Bayliss, Sion, Clifton, David A., Wilson, Daniel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8553134/
https://www.ncbi.nlm.nih.gov/pubmed/34662334
http://dx.doi.org/10.1371/journal.pgen.1009436
_version_ 1784591522032779264
author Arning, Nicolas
Sheppard, Samuel K.
Bayliss, Sion
Clifton, David A.
Wilson, Daniel J.
author_facet Arning, Nicolas
Sheppard, Samuel K.
Bayliss, Sion
Clifton, David A.
Wilson, Daniel J.
author_sort Arning, Nicolas
collection PubMed
description Campylobacteriosis is among the world’s most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.
format Online
Article
Text
id pubmed-8553134
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-85531342021-10-29 Machine learning to predict the source of campylobacteriosis using whole genome data Arning, Nicolas Sheppard, Samuel K. Bayliss, Sion Clifton, David A. Wilson, Daniel J. PLoS Genet Research Article Campylobacteriosis is among the world’s most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential. Public Library of Science 2021-10-18 /pmc/articles/PMC8553134/ /pubmed/34662334 http://dx.doi.org/10.1371/journal.pgen.1009436 Text en © 2021 Arning et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Arning, Nicolas
Sheppard, Samuel K.
Bayliss, Sion
Clifton, David A.
Wilson, Daniel J.
Machine learning to predict the source of campylobacteriosis using whole genome data
title Machine learning to predict the source of campylobacteriosis using whole genome data
title_full Machine learning to predict the source of campylobacteriosis using whole genome data
title_fullStr Machine learning to predict the source of campylobacteriosis using whole genome data
title_full_unstemmed Machine learning to predict the source of campylobacteriosis using whole genome data
title_short Machine learning to predict the source of campylobacteriosis using whole genome data
title_sort machine learning to predict the source of campylobacteriosis using whole genome data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8553134/
https://www.ncbi.nlm.nih.gov/pubmed/34662334
http://dx.doi.org/10.1371/journal.pgen.1009436
work_keys_str_mv AT arningnicolas machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata
AT sheppardsamuelk machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata
AT baylisssion machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata
AT cliftondavida machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata
AT wilsondanielj machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata