Cargando…
MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis
As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxon...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9651046/ https://www.ncbi.nlm.nih.gov/pubmed/36389404 http://dx.doi.org/10.7717/peerj.14292 |
_version_ | 1784828158132879360 |
---|---|
author | Furstenau, Tara N. Schneider, Tsosie Shaffer, Isaac Vazquez, Adam J. Sahl, Jason Fofanov, Viacheslav |
author_facet | Furstenau, Tara N. Schneider, Tsosie Shaffer, Isaac Vazquez, Adam J. Sahl, Jason Fofanov, Viacheslav |
author_sort | Furstenau, Tara N. |
collection | PubMed |
description | As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxonomy and summarize the composition of microbial communities, but they sacrifice accuracy and can lead to false positives. Full alignment tools provide higher confidence assignments and can assign sequences from genomes that diverge from reference sequences; however, full alignment tools are computationally intensive. To address this, we designed MTSv specifically for alignment-based taxonomic assignment in metagenomic analysis. This tool implements an FM-index assisted q-gram filter and SIMD accelerated Smith-Waterman algorithm to find alignments. However, unlike traditional aligners, MTSv will not attempt to make additional alignments to a TaxID once an alignment of sufficient quality has been found. This improves efficiency when many reference sequences are available per taxon. MTSv was designed to be flexible and can be modified to run on either memory or processor constrained systems. Although MTSv cannot compete with the speeds of exact k-mer matching approaches, it is reasonably fast and has higher precision than popular exact matching approaches. Because MTSv performs a full alignment it can classify reads even when the genomes share low similarity with reference sequences and provides a tool for high confidence pathogen detection with low off-target assignments to near neighbor species. |
format | Online Article Text |
id | pubmed-9651046 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-96510462022-11-15 MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis Furstenau, Tara N. Schneider, Tsosie Shaffer, Isaac Vazquez, Adam J. Sahl, Jason Fofanov, Viacheslav PeerJ Bioinformatics As the size of reference sequence databases and high-throughput sequencing datasets continue to grow, it is becoming computationally infeasible to use traditional alignment to large genome databases for taxonomic classification of metagenomic reads. Exact matching approaches can rapidly assign taxonomy and summarize the composition of microbial communities, but they sacrifice accuracy and can lead to false positives. Full alignment tools provide higher confidence assignments and can assign sequences from genomes that diverge from reference sequences; however, full alignment tools are computationally intensive. To address this, we designed MTSv specifically for alignment-based taxonomic assignment in metagenomic analysis. This tool implements an FM-index assisted q-gram filter and SIMD accelerated Smith-Waterman algorithm to find alignments. However, unlike traditional aligners, MTSv will not attempt to make additional alignments to a TaxID once an alignment of sufficient quality has been found. This improves efficiency when many reference sequences are available per taxon. MTSv was designed to be flexible and can be modified to run on either memory or processor constrained systems. Although MTSv cannot compete with the speeds of exact k-mer matching approaches, it is reasonably fast and has higher precision than popular exact matching approaches. Because MTSv performs a full alignment it can classify reads even when the genomes share low similarity with reference sequences and provides a tool for high confidence pathogen detection with low off-target assignments to near neighbor species. PeerJ Inc. 2022-11-08 /pmc/articles/PMC9651046/ /pubmed/36389404 http://dx.doi.org/10.7717/peerj.14292 Text en © 2022 Furstenau et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Furstenau, Tara N. Schneider, Tsosie Shaffer, Isaac Vazquez, Adam J. Sahl, Jason Fofanov, Viacheslav MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
title | MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
title_full | MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
title_fullStr | MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
title_full_unstemmed | MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
title_short | MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
title_sort | mtsv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9651046/ https://www.ncbi.nlm.nih.gov/pubmed/36389404 http://dx.doi.org/10.7717/peerj.14292 |
work_keys_str_mv | AT furstenautaran mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis AT schneidertsosie mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis AT shafferisaac mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis AT vazquezadamj mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis AT sahljason mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis AT fofanovviacheslav mtsvrapidalignmentbasedtaxonomicclassificationandhighconfidencemetagenomicanalysis |