Cargando…

Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts

While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Byron J., Li, Xiangpeng, Shi, Zhou Jason, Abate, Adam, Pollard, Katherine S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580935/
https://www.ncbi.nlm.nih.gov/pubmed/36304283
http://dx.doi.org/10.3389/fbinf.2022.867386
_version_ 1784812504395808768
author Smith, Byron J.
Li, Xiangpeng
Shi, Zhou Jason
Abate, Adam
Pollard, Katherine S.
author_facet Smith, Byron J.
Li, Xiangpeng
Shi, Zhou Jason
Abate, Adam
Pollard, Katherine S.
author_sort Smith, Byron J.
collection PubMed
description While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a “fuzzy” genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data.
format Online
Article
Text
id pubmed-9580935
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95809352022-10-26 Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts Smith, Byron J. Li, Xiangpeng Shi, Zhou Jason Abate, Adam Pollard, Katherine S. Front Bioinform Bioinformatics While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a “fuzzy” genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data. Frontiers Media S.A. 2022-05-16 /pmc/articles/PMC9580935/ /pubmed/36304283 http://dx.doi.org/10.3389/fbinf.2022.867386 Text en Copyright © 2022 Smith, Li, Shi, Abate and Pollard. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Smith, Byron J.
Li, Xiangpeng
Shi, Zhou Jason
Abate, Adam
Pollard, Katherine S.
Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
title Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
title_full Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
title_fullStr Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
title_full_unstemmed Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
title_short Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
title_sort scalable microbial strain inference in metagenomic data using strainfacts
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580935/
https://www.ncbi.nlm.nih.gov/pubmed/36304283
http://dx.doi.org/10.3389/fbinf.2022.867386
work_keys_str_mv AT smithbyronj scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts
AT lixiangpeng scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts
AT shizhoujason scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts
AT abateadam scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts
AT pollardkatherines scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts