Cargando…

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

BACKGROUND: Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of N...

Descripción completa

Detalles Bibliográficos
Autor principal: Daily, Jeff
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4748600/
https://www.ncbi.nlm.nih.gov/pubmed/26864881
http://dx.doi.org/10.1186/s12859-016-0930-z
_version_ 1782415149732200448
author Daily, Jeff
author_facet Daily, Jeff
author_sort Daily, Jeff
collection PubMed
description BACKGROUND: Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. RESULTS: A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar’s ‘striped’ approach. Rognes’s SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail’s prefix scan implementation is generally the fastest, faster even than Farrar’s ‘striped’ approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. CONCLUSIONS: Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0930-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4748600
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47486002016-02-11 Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments Daily, Jeff BMC Bioinformatics Software BACKGROUND: Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. RESULTS: A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar’s ‘striped’ approach. Rognes’s SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail’s prefix scan implementation is generally the fastest, faster even than Farrar’s ‘striped’ approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. CONCLUSIONS: Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0930-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-10 /pmc/articles/PMC4748600/ /pubmed/26864881 http://dx.doi.org/10.1186/s12859-016-0930-z Text en © Daily. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Daily, Jeff
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
title Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
title_full Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
title_fullStr Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
title_full_unstemmed Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
title_short Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
title_sort parasail: simd c library for global, semi-global, and local pairwise sequence alignments
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4748600/
https://www.ncbi.nlm.nih.gov/pubmed/26864881
http://dx.doi.org/10.1186/s12859-016-0930-z
work_keys_str_mv AT dailyjeff parasailsimdclibraryforglobalsemiglobalandlocalpairwisesequencealignments