Cargando…

Strain level microbial detection and quantification with applications to single cell metagenomics

Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignmen...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Kaiyuan, Schäffer, Alejandro A., Robinson, Welles, Xu, Junyan, Ruppin, Eytan, Ergun, A. Funda, Ye, Yuzhen, Sahinalp, S. Cenk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9616933/
https://www.ncbi.nlm.nih.gov/pubmed/36307411
http://dx.doi.org/10.1038/s41467-022-33869-7
_version_ 1784820747524374528
author Zhu, Kaiyuan
Schäffer, Alejandro A.
Robinson, Welles
Xu, Junyan
Ruppin, Eytan
Ergun, A. Funda
Ye, Yuzhen
Sahinalp, S. Cenk
author_facet Zhu, Kaiyuan
Schäffer, Alejandro A.
Robinson, Welles
Xu, Junyan
Ruppin, Eytan
Ergun, A. Funda
Ye, Yuzhen
Sahinalp, S. Cenk
author_sort Zhu, Kaiyuan
collection PubMed
description Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignment-free approaches, which often fail to correctly assign reads to genomes. Here we introduce CAMMiQ, a combinatorial optimization framework to identify and quantify distinct genomes (specified by a database) in a metagenomic dataset. As a key methodological innovation, CAMMiQ uses substrings of variable length and those that appear in two genomes in the database, as opposed to the commonly used fixed-length, unique substrings. These substrings allow to accurately decouple mixtures of highly similar genomes resulting in higher accuracy than the leading alternatives, without requiring additional computational resources, as demonstrated on commonly used benchmarking datasets. Importantly, we show that CAMMiQ can distinguish closely related bacterial strains in simulated metagenomic and real single-cell metatranscriptomic data.
format Online
Article
Text
id pubmed-9616933
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96169332022-10-30 Strain level microbial detection and quantification with applications to single cell metagenomics Zhu, Kaiyuan Schäffer, Alejandro A. Robinson, Welles Xu, Junyan Ruppin, Eytan Ergun, A. Funda Ye, Yuzhen Sahinalp, S. Cenk Nat Commun Article Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignment-free approaches, which often fail to correctly assign reads to genomes. Here we introduce CAMMiQ, a combinatorial optimization framework to identify and quantify distinct genomes (specified by a database) in a metagenomic dataset. As a key methodological innovation, CAMMiQ uses substrings of variable length and those that appear in two genomes in the database, as opposed to the commonly used fixed-length, unique substrings. These substrings allow to accurately decouple mixtures of highly similar genomes resulting in higher accuracy than the leading alternatives, without requiring additional computational resources, as demonstrated on commonly used benchmarking datasets. Importantly, we show that CAMMiQ can distinguish closely related bacterial strains in simulated metagenomic and real single-cell metatranscriptomic data. Nature Publishing Group UK 2022-10-28 /pmc/articles/PMC9616933/ /pubmed/36307411 http://dx.doi.org/10.1038/s41467-022-33869-7 Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Zhu, Kaiyuan
Schäffer, Alejandro A.
Robinson, Welles
Xu, Junyan
Ruppin, Eytan
Ergun, A. Funda
Ye, Yuzhen
Sahinalp, S. Cenk
Strain level microbial detection and quantification with applications to single cell metagenomics
title Strain level microbial detection and quantification with applications to single cell metagenomics
title_full Strain level microbial detection and quantification with applications to single cell metagenomics
title_fullStr Strain level microbial detection and quantification with applications to single cell metagenomics
title_full_unstemmed Strain level microbial detection and quantification with applications to single cell metagenomics
title_short Strain level microbial detection and quantification with applications to single cell metagenomics
title_sort strain level microbial detection and quantification with applications to single cell metagenomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9616933/
https://www.ncbi.nlm.nih.gov/pubmed/36307411
http://dx.doi.org/10.1038/s41467-022-33869-7
work_keys_str_mv AT zhukaiyuan strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT schafferalejandroa strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT robinsonwelles strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT xujunyan strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT ruppineytan strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT ergunafunda strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT yeyuzhen strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics
AT sahinalpscenk strainlevelmicrobialdetectionandquantificationwithapplicationstosinglecellmetagenomics