Cargando…

Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica

Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillan...

Descripción completa

Detalles Bibliográficos
Autores principales: Wheeler, Nicole E., Gardner, Paul P., Barquist, Lars
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940178/
https://www.ncbi.nlm.nih.gov/pubmed/29738521
http://dx.doi.org/10.1371/journal.pgen.1007333
_version_ 1783321063888781312
author Wheeler, Nicole E.
Gardner, Paul P.
Barquist, Lars
author_facet Wheeler, Nicole E.
Gardner, Paul P.
Barquist, Lars
author_sort Wheeler, Nicole E.
collection PubMed
description Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.
format Online
Article
Text
id pubmed-5940178
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59401782018-05-18 Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica Wheeler, Nicole E. Gardner, Paul P. Barquist, Lars PLoS Genet Research Article Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses. Public Library of Science 2018-05-08 /pmc/articles/PMC5940178/ /pubmed/29738521 http://dx.doi.org/10.1371/journal.pgen.1007333 Text en © 2018 Wheeler et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wheeler, Nicole E.
Gardner, Paul P.
Barquist, Lars
Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica
title Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica
title_full Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica
title_fullStr Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica
title_full_unstemmed Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica
title_short Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica
title_sort machine learning identifies signatures of host adaptation in the bacterial pathogen salmonella enterica
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940178/
https://www.ncbi.nlm.nih.gov/pubmed/29738521
http://dx.doi.org/10.1371/journal.pgen.1007333
work_keys_str_mv AT wheelernicolee machinelearningidentifiessignaturesofhostadaptationinthebacterialpathogensalmonellaenterica
AT gardnerpaulp machinelearningidentifiessignaturesofhostadaptationinthebacterialpathogensalmonellaenterica
AT barquistlars machinelearningidentifiessignaturesofhostadaptationinthebacterialpathogensalmonellaenterica