Cargando…
Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts
While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580935/ https://www.ncbi.nlm.nih.gov/pubmed/36304283 http://dx.doi.org/10.3389/fbinf.2022.867386 |
_version_ | 1784812504395808768 |
---|---|
author | Smith, Byron J. Li, Xiangpeng Shi, Zhou Jason Abate, Adam Pollard, Katherine S. |
author_facet | Smith, Byron J. Li, Xiangpeng Shi, Zhou Jason Abate, Adam Pollard, Katherine S. |
author_sort | Smith, Byron J. |
collection | PubMed |
description | While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a “fuzzy” genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data. |
format | Online Article Text |
id | pubmed-9580935 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95809352022-10-26 Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts Smith, Byron J. Li, Xiangpeng Shi, Zhou Jason Abate, Adam Pollard, Katherine S. Front Bioinform Bioinformatics While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a “fuzzy” genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data. Frontiers Media S.A. 2022-05-16 /pmc/articles/PMC9580935/ /pubmed/36304283 http://dx.doi.org/10.3389/fbinf.2022.867386 Text en Copyright © 2022 Smith, Li, Shi, Abate and Pollard. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Smith, Byron J. Li, Xiangpeng Shi, Zhou Jason Abate, Adam Pollard, Katherine S. Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts |
title | Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts |
title_full | Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts |
title_fullStr | Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts |
title_full_unstemmed | Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts |
title_short | Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts |
title_sort | scalable microbial strain inference in metagenomic data using strainfacts |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580935/ https://www.ncbi.nlm.nih.gov/pubmed/36304283 http://dx.doi.org/10.3389/fbinf.2022.867386 |
work_keys_str_mv | AT smithbyronj scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts AT lixiangpeng scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts AT shizhoujason scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts AT abateadam scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts AT pollardkatherines scalablemicrobialstraininferenceinmetagenomicdatausingstrainfacts |