Cargando…

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel

BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to referen...

Descripción completa

Detalles Bibliográficos
Autores principales: Alachiotis, Nikolaos, Berger, Simon A, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3496624/
https://www.ncbi.nlm.nih.gov/pubmed/22876807
http://dx.doi.org/10.1186/1471-2105-13-196
_version_ 1782249651518308352
author Alachiotis, Nikolaos
Berger, Simon A
Stamatakis, Alexandros
author_facet Alachiotis, Nikolaos
Berger, Simon A
Stamatakis, Alexandros
author_sort Alachiotis, Nikolaos
collection PubMed
description BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms.
format Online
Article
Text
id pubmed-3496624
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34966242012-11-19 Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel Alachiotis, Nikolaos Berger, Simon A Stamatakis, Alexandros BMC Bioinformatics Research Article BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms. BioMed Central 2012-08-09 /pmc/articles/PMC3496624/ /pubmed/22876807 http://dx.doi.org/10.1186/1471-2105-13-196 Text en Copyright ©2012 Alachiotis et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Alachiotis, Nikolaos
Berger, Simon A
Stamatakis, Alexandros
Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
title Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
title_full Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
title_fullStr Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
title_full_unstemmed Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
title_short Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
title_sort coupling simd and simt architectures to boost performance of a phylogeny-aware alignment kernel
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3496624/
https://www.ncbi.nlm.nih.gov/pubmed/22876807
http://dx.doi.org/10.1186/1471-2105-13-196
work_keys_str_mv AT alachiotisnikolaos couplingsimdandsimtarchitecturestoboostperformanceofaphylogenyawarealignmentkernel
AT bergersimona couplingsimdandsimtarchitecturestoboostperformanceofaphylogenyawarealignmentkernel
AT stamatakisalexandros couplingsimdandsimtarchitecturestoboostperformanceofaphylogenyawarealignmentkernel