Cargando…

PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index

MOTIVATION: Sequence alignment is one of the first steps in many modern genomic analyses, such as variant detection, transcript abundance estimation and metagenomic profiling. Unfortunately, it is often a computationally expensive procedure. As the quantity of data and wealth of different assays and...

Descripción completa

Detalles Bibliográficos
Autores principales: Almodaresi, Fatemeh, Zakeri, Mohsen, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502150/
https://www.ncbi.nlm.nih.gov/pubmed/34117875
http://dx.doi.org/10.1093/bioinformatics/btab408
_version_ 1784795635642269696
author Almodaresi, Fatemeh
Zakeri, Mohsen
Patro, Rob
author_facet Almodaresi, Fatemeh
Zakeri, Mohsen
Patro, Rob
author_sort Almodaresi, Fatemeh
collection PubMed
description MOTIVATION: Sequence alignment is one of the first steps in many modern genomic analyses, such as variant detection, transcript abundance estimation and metagenomic profiling. Unfortunately, it is often a computationally expensive procedure. As the quantity of data and wealth of different assays and applications continue to grow, the need for accurate and fast alignment tools that scale to large collections of reference sequences persists. RESULTS: In this article, we introduce PuffAligner, a fast, accurate and versatile aligner built on top of the Pufferfish index. PuffAligner is able to produce highly sensitive alignments, similar to those of Bowtie2, but much more quickly. While exhibiting similar speed to the ultrafast STAR aligner, PuffAligner requires considerably less memory to construct its index and align reads. PuffAligner strikes a desirable balance with respect to the time, space and accuracy tradeoffs made by different alignment tools and provides a promising foundation on which to test new alignment ideas over large collections of sequences. AVAILABILITY AND IMPLEMENTATION: All the data used for preparing the results of this paper can be found with 10.5281/zenodo.4902332. PuffAligner is a free and open-source software. It is implemented in C++14 and can be obtained from https://github.com/COMBINE-lab/pufferfish/tree/cigar-strings. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9502150
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95021502022-09-26 PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index Almodaresi, Fatemeh Zakeri, Mohsen Patro, Rob Bioinformatics Original Papers MOTIVATION: Sequence alignment is one of the first steps in many modern genomic analyses, such as variant detection, transcript abundance estimation and metagenomic profiling. Unfortunately, it is often a computationally expensive procedure. As the quantity of data and wealth of different assays and applications continue to grow, the need for accurate and fast alignment tools that scale to large collections of reference sequences persists. RESULTS: In this article, we introduce PuffAligner, a fast, accurate and versatile aligner built on top of the Pufferfish index. PuffAligner is able to produce highly sensitive alignments, similar to those of Bowtie2, but much more quickly. While exhibiting similar speed to the ultrafast STAR aligner, PuffAligner requires considerably less memory to construct its index and align reads. PuffAligner strikes a desirable balance with respect to the time, space and accuracy tradeoffs made by different alignment tools and provides a promising foundation on which to test new alignment ideas over large collections of sequences. AVAILABILITY AND IMPLEMENTATION: All the data used for preparing the results of this paper can be found with 10.5281/zenodo.4902332. PuffAligner is a free and open-source software. It is implemented in C++14 and can be obtained from https://github.com/COMBINE-lab/pufferfish/tree/cigar-strings. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-06-12 /pmc/articles/PMC9502150/ /pubmed/34117875 http://dx.doi.org/10.1093/bioinformatics/btab408 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Almodaresi, Fatemeh
Zakeri, Mohsen
Patro, Rob
PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index
title PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index
title_full PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index
title_fullStr PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index
title_full_unstemmed PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index
title_short PuffAligner: a fast, efficient and accurate aligner based on the Pufferfish index
title_sort puffaligner: a fast, efficient and accurate aligner based on the pufferfish index
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502150/
https://www.ncbi.nlm.nih.gov/pubmed/34117875
http://dx.doi.org/10.1093/bioinformatics/btab408
work_keys_str_mv AT almodaresifatemeh puffalignerafastefficientandaccuratealignerbasedonthepufferfishindex
AT zakerimohsen puffalignerafastefficientandaccuratealignerbasedonthepufferfishindex
AT patrorob puffalignerafastefficientandaccuratealignerbasedonthepufferfishindex