Cargando…

Strain Tracking with Uncertainty Quantification

The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Younhun, Worby, Colin J., Acharya, Sawal, van Dijk, Lucas R., Alfonsetti, Daniel, Gromko, Zackary, Azimzadeh, Philippe, Dodson, Karen, Gerber, Georg, Hultgren, Scott, Earl, Ashlee M., Berger, Bonnie, Gibson, Travis E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900846/
https://www.ncbi.nlm.nih.gov/pubmed/36747646
http://dx.doi.org/10.1101/2023.01.25.525531
_version_ 1784882926991704064
author Kim, Younhun
Worby, Colin J.
Acharya, Sawal
van Dijk, Lucas R.
Alfonsetti, Daniel
Gromko, Zackary
Azimzadeh, Philippe
Dodson, Karen
Gerber, Georg
Hultgren, Scott
Earl, Ashlee M.
Berger, Bonnie
Gibson, Travis E.
author_facet Kim, Younhun
Worby, Colin J.
Acharya, Sawal
van Dijk, Lucas R.
Alfonsetti, Daniel
Gromko, Zackary
Azimzadeh, Philippe
Dodson, Karen
Gerber, Georg
Hultgren, Scott
Earl, Ashlee M.
Berger, Bonnie
Gibson, Travis E.
author_sort Kim, Younhun
collection PubMed
description The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain, that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences’ quality scores and the samples’ temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain’s improved performance in capturing post-antibiotic E. coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also include and analyze newly sequenced cultured samples from the UMB Project.
format Online
Article
Text
id pubmed-9900846
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-99008462023-02-07 Strain Tracking with Uncertainty Quantification Kim, Younhun Worby, Colin J. Acharya, Sawal van Dijk, Lucas R. Alfonsetti, Daniel Gromko, Zackary Azimzadeh, Philippe Dodson, Karen Gerber, Georg Hultgren, Scott Earl, Ashlee M. Berger, Bonnie Gibson, Travis E. bioRxiv Article The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain, that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences’ quality scores and the samples’ temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain’s improved performance in capturing post-antibiotic E. coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also include and analyze newly sequenced cultured samples from the UMB Project. Cold Spring Harbor Laboratory 2023-01-26 /pmc/articles/PMC9900846/ /pubmed/36747646 http://dx.doi.org/10.1101/2023.01.25.525531 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Kim, Younhun
Worby, Colin J.
Acharya, Sawal
van Dijk, Lucas R.
Alfonsetti, Daniel
Gromko, Zackary
Azimzadeh, Philippe
Dodson, Karen
Gerber, Georg
Hultgren, Scott
Earl, Ashlee M.
Berger, Bonnie
Gibson, Travis E.
Strain Tracking with Uncertainty Quantification
title Strain Tracking with Uncertainty Quantification
title_full Strain Tracking with Uncertainty Quantification
title_fullStr Strain Tracking with Uncertainty Quantification
title_full_unstemmed Strain Tracking with Uncertainty Quantification
title_short Strain Tracking with Uncertainty Quantification
title_sort strain tracking with uncertainty quantification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900846/
https://www.ncbi.nlm.nih.gov/pubmed/36747646
http://dx.doi.org/10.1101/2023.01.25.525531
work_keys_str_mv AT kimyounhun straintrackingwithuncertaintyquantification
AT worbycolinj straintrackingwithuncertaintyquantification
AT acharyasawal straintrackingwithuncertaintyquantification
AT vandijklucasr straintrackingwithuncertaintyquantification
AT alfonsettidaniel straintrackingwithuncertaintyquantification
AT gromkozackary straintrackingwithuncertaintyquantification
AT azimzadehphilippe straintrackingwithuncertaintyquantification
AT dodsonkaren straintrackingwithuncertaintyquantification
AT gerbergeorg straintrackingwithuncertaintyquantification
AT hultgrenscott straintrackingwithuncertaintyquantification
AT earlashleem straintrackingwithuncertaintyquantification
AT bergerbonnie straintrackingwithuncertaintyquantification
AT gibsontravise straintrackingwithuncertaintyquantification