Cargando…
Strain Tracking with Uncertainty Quantification
The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900846/ https://www.ncbi.nlm.nih.gov/pubmed/36747646 http://dx.doi.org/10.1101/2023.01.25.525531 |
_version_ | 1784882926991704064 |
---|---|
author | Kim, Younhun Worby, Colin J. Acharya, Sawal van Dijk, Lucas R. Alfonsetti, Daniel Gromko, Zackary Azimzadeh, Philippe Dodson, Karen Gerber, Georg Hultgren, Scott Earl, Ashlee M. Berger, Bonnie Gibson, Travis E. |
author_facet | Kim, Younhun Worby, Colin J. Acharya, Sawal van Dijk, Lucas R. Alfonsetti, Daniel Gromko, Zackary Azimzadeh, Philippe Dodson, Karen Gerber, Georg Hultgren, Scott Earl, Ashlee M. Berger, Bonnie Gibson, Travis E. |
author_sort | Kim, Younhun |
collection | PubMed |
description | The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain, that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences’ quality scores and the samples’ temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain’s improved performance in capturing post-antibiotic E. coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also include and analyze newly sequenced cultured samples from the UMB Project. |
format | Online Article Text |
id | pubmed-9900846 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-99008462023-02-07 Strain Tracking with Uncertainty Quantification Kim, Younhun Worby, Colin J. Acharya, Sawal van Dijk, Lucas R. Alfonsetti, Daniel Gromko, Zackary Azimzadeh, Philippe Dodson, Karen Gerber, Georg Hultgren, Scott Earl, Ashlee M. Berger, Bonnie Gibson, Travis E. bioRxiv Article The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain, that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences’ quality scores and the samples’ temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain’s improved performance in capturing post-antibiotic E. coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also include and analyze newly sequenced cultured samples from the UMB Project. Cold Spring Harbor Laboratory 2023-01-26 /pmc/articles/PMC9900846/ /pubmed/36747646 http://dx.doi.org/10.1101/2023.01.25.525531 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Kim, Younhun Worby, Colin J. Acharya, Sawal van Dijk, Lucas R. Alfonsetti, Daniel Gromko, Zackary Azimzadeh, Philippe Dodson, Karen Gerber, Georg Hultgren, Scott Earl, Ashlee M. Berger, Bonnie Gibson, Travis E. Strain Tracking with Uncertainty Quantification |
title | Strain Tracking with Uncertainty Quantification |
title_full | Strain Tracking with Uncertainty Quantification |
title_fullStr | Strain Tracking with Uncertainty Quantification |
title_full_unstemmed | Strain Tracking with Uncertainty Quantification |
title_short | Strain Tracking with Uncertainty Quantification |
title_sort | strain tracking with uncertainty quantification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900846/ https://www.ncbi.nlm.nih.gov/pubmed/36747646 http://dx.doi.org/10.1101/2023.01.25.525531 |
work_keys_str_mv | AT kimyounhun straintrackingwithuncertaintyquantification AT worbycolinj straintrackingwithuncertaintyquantification AT acharyasawal straintrackingwithuncertaintyquantification AT vandijklucasr straintrackingwithuncertaintyquantification AT alfonsettidaniel straintrackingwithuncertaintyquantification AT gromkozackary straintrackingwithuncertaintyquantification AT azimzadehphilippe straintrackingwithuncertaintyquantification AT dodsonkaren straintrackingwithuncertaintyquantification AT gerbergeorg straintrackingwithuncertaintyquantification AT hultgrenscott straintrackingwithuncertaintyquantification AT earlashleem straintrackingwithuncertaintyquantification AT bergerbonnie straintrackingwithuncertaintyquantification AT gibsontravise straintrackingwithuncertaintyquantification |