Cargando…

MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis

As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxon...

Descripción completa

Detalles Bibliográficos
Autores principales: Furstenau, Tara N., Schneider, Tsosie, Shaffer, Isaac, Vazquez, Adam J., Sahl, Jason, Fofanov, Viacheslav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9651046/
https://www.ncbi.nlm.nih.gov/pubmed/36389404
http://dx.doi.org/10.7717/peerj.14292
_version_ 1784828158132879360
author Furstenau, Tara N.
Schneider, Tsosie
Shaffer, Isaac
Vazquez, Adam J.
Sahl, Jason
Fofanov, Viacheslav
author_facet Furstenau, Tara N.
Schneider, Tsosie
Shaffer, Isaac
Vazquez, Adam J.
Sahl, Jason
Fofanov, Viacheslav
author_sort Furstenau, Tara N.
collection PubMed
description As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxonomy and summarize the composition of microbial communities, but they sacrifice accuracy and can lead to false positives. Full alignment tools provide higher confidence assignments and can assign sequences from genomes that diverge from reference sequences; however, full alignment tools are computationally intensive. To address this, we designed MTSv specifically for alignment-based taxonomic assignment in metagenomic analysis. This tool implements an FM-index assisted q-gram filter and SIMD accelerated Smith-Waterman algorithm to find alignments. However, unlike traditional aligners, MTSv will not attempt to make additional alignments to a TaxID once an alignment of sufficient quality has been found. This improves efficiency when many reference sequences are available per taxon. MTSv was designed to be flexible and can be modified to run on either memory or processor constrained systems. Although MTSv cannot compete with the speeds of exact k-mer matching approaches, it is reasonably fast and has higher precision than popular exact matching approaches. Because MTSv performs a full alignment it can classify reads even when the genomes share low similarity with reference sequences and provides a tool for high confidence pathogen detection with low off-target assignments to near neighbor species.
format Online
Article
Text
id pubmed-9651046
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-96510462022-11-15 MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis Furstenau, Tara N. Schneider, Tsosie Shaffer, Isaac Vazquez, Adam J. Sahl, Jason Fofanov, Viacheslav PeerJ Bioinformatics As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxonomy and summarize the composition of microbial communities, but they sacrifice accuracy and can lead to false positives. Full alignment tools provide higher confidence assignments and can assign sequences from genomes that diverge from reference sequences; however, full alignment tools are computationally intensive. To address this, we designed MTSv specifically for alignment-based taxonomic assignment in metagenomic analysis. This tool implements an FM-index assisted q-gram filter and SIMD accelerated Smith-Waterman algorithm to find alignments. However, unlike traditional aligners, MTSv will not attempt to make additional alignments to a TaxID once an alignment of sufficient quality has been found. This improves efficiency when many reference sequences are available per taxon. MTSv was designed to be flexible and can be modified to run on either memory or processor constrained systems. Although MTSv cannot compete with the speeds of exact k-mer matching approaches, it is reasonably fast and has higher precision than popular exact matching approaches. Because MTSv performs a full alignment it can classify reads even when the genomes share low similarity with reference sequences and provides a tool for high confidence pathogen detection with low off-target assignments to near neighbor species. PeerJ Inc. 2022-11-08 /pmc/articles/PMC9651046/ /pubmed/36389404 http://dx.doi.org/10.7717/peerj.14292 Text en © 2022 Furstenau et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Furstenau, Tara N.
Schneider, Tsosie
Shaffer, Isaac
Vazquez, Adam J.
Sahl, Jason
Fofanov, Viacheslav
MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
title MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
title_full MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
title_fullStr MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
title_full_unstemmed MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
title_short MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
title_sort mtsv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9651046/
https://www.ncbi.nlm.nih.gov/pubmed/36389404
http://dx.doi.org/10.7717/peerj.14292
work_keys_str_mv AT furstenautaran mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis
AT schneidertsosie mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis
AT shafferisaac mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis
AT vazquezadamj mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis
AT sahljason mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis
AT fofanovviacheslav mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis