Cargando…

Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes

Metagenomics is a powerful tool for understanding organismal interactions; however, classification, profiling and detection of interactions at the strain level remain challenging. We present an automated pipeline, quantitative metagenomic alignment and taxonomic exact matching (Qmatey), that perform...

Descripción completa

Detalles Bibliográficos
Autores principales: Adams, Alison K, Kristy, Brandon D, Gorman, Myranda, Balint-Kurti, Peter, Yencho, G Craig, Olukolu, Bode A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569747/
https://www.ncbi.nlm.nih.gov/pubmed/37824740
http://dx.doi.org/10.1093/bib/bbad351
_version_ 1785119616326959104
author Adams, Alison K
Kristy, Brandon D
Gorman, Myranda
Balint-Kurti, Peter
Yencho, G Craig
Olukolu, Bode A
author_facet Adams, Alison K
Kristy, Brandon D
Gorman, Myranda
Balint-Kurti, Peter
Yencho, G Craig
Olukolu, Bode A
author_sort Adams, Alison K
collection PubMed
description Metagenomics is a powerful tool for understanding organismal interactions; however, classification, profiling and detection of interactions at the strain level remain challenging. We present an automated pipeline, quantitative metagenomic alignment and taxonomic exact matching (Qmatey), that performs a fast exact matching-based alignment and integration of taxonomic binning and profiling. It interrogates large databases without using metagenome-assembled genomes, curated pan-genes or k-mer spectra that limit resolution. Qmatey minimizes misclassification and maintains strain level resolution by using only diagnostic reads as shown in the analysis of amplicon, quantitative reduced representation and shotgun sequencing datasets. Using Qmatey to analyze shotgun data from a synthetic community with 35% of the 26 strains at low abundance (0.01–0.06%), we revealed a remarkable 85–96% strain recall and 92–100% species recall while maintaining 100% precision. Benchmarking revealed that the highly ranked Kraken2 and KrakenUniq tools identified 2–4 more taxa (92–100% recall) than Qmatey but produced 315–1752 false positive taxa and high penalty on precision (1–8%). The speed, accuracy and precision of the Qmatey pipeline positions it as a valuable tool for broad-spectrum profiling and for uncovering biologically relevant interactions.
format Online
Article
Text
id pubmed-10569747
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105697472023-10-13 Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes Adams, Alison K Kristy, Brandon D Gorman, Myranda Balint-Kurti, Peter Yencho, G Craig Olukolu, Bode A Brief Bioinform Problem Solving Protocol Metagenomics is a powerful tool for understanding organismal interactions; however, classification, profiling and detection of interactions at the strain level remain challenging. We present an automated pipeline, quantitative metagenomic alignment and taxonomic exact matching (Qmatey), that performs a fast exact matching-based alignment and integration of taxonomic binning and profiling. It interrogates large databases without using metagenome-assembled genomes, curated pan-genes or k-mer spectra that limit resolution. Qmatey minimizes misclassification and maintains strain level resolution by using only diagnostic reads as shown in the analysis of amplicon, quantitative reduced representation and shotgun sequencing datasets. Using Qmatey to analyze shotgun data from a synthetic community with 35% of the 26 strains at low abundance (0.01–0.06%), we revealed a remarkable 85–96% strain recall and 92–100% species recall while maintaining 100% precision. Benchmarking revealed that the highly ranked Kraken2 and KrakenUniq tools identified 2–4 more taxa (92–100% recall) than Qmatey but produced 315–1752 false positive taxa and high penalty on precision (1–8%). The speed, accuracy and precision of the Qmatey pipeline positions it as a valuable tool for broad-spectrum profiling and for uncovering biologically relevant interactions. Oxford University Press 2023-10-11 /pmc/articles/PMC10569747/ /pubmed/37824740 http://dx.doi.org/10.1093/bib/bbad351 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Adams, Alison K
Kristy, Brandon D
Gorman, Myranda
Balint-Kurti, Peter
Yencho, G Craig
Olukolu, Bode A
Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
title Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
title_full Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
title_fullStr Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
title_full_unstemmed Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
title_short Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
title_sort qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569747/
https://www.ncbi.nlm.nih.gov/pubmed/37824740
http://dx.doi.org/10.1093/bib/bbad351
work_keys_str_mv AT adamsalisonk qmateyanautomatedpipelineforfastexactmatchingbasedalignmentandstrainleveltaxonomicbinningandprofilingofmetagenomes
AT kristybrandond qmateyanautomatedpipelineforfastexactmatchingbasedalignmentandstrainleveltaxonomicbinningandprofilingofmetagenomes
AT gormanmyranda qmateyanautomatedpipelineforfastexactmatchingbasedalignmentandstrainleveltaxonomicbinningandprofilingofmetagenomes
AT balintkurtipeter qmateyanautomatedpipelineforfastexactmatchingbasedalignmentandstrainleveltaxonomicbinningandprofilingofmetagenomes
AT yenchogcraig qmateyanautomatedpipelineforfastexactmatchingbasedalignmentandstrainleveltaxonomicbinningandprofilingofmetagenomes
AT olukolubodea qmateyanautomatedpipelineforfastexactmatchingbasedalignmentandstrainleveltaxonomicbinningandprofilingofmetagenomes