Cargando…
Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to referen...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3496624/ https://www.ncbi.nlm.nih.gov/pubmed/22876807 http://dx.doi.org/10.1186/1471-2105-13-196 |
_version_ | 1782249651518308352 |
---|---|
author | Alachiotis, Nikolaos Berger, Simon A Stamatakis, Alexandros |
author_facet | Alachiotis, Nikolaos Berger, Simon A Stamatakis, Alexandros |
author_sort | Alachiotis, Nikolaos |
collection | PubMed |
description | BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms. |
format | Online Article Text |
id | pubmed-3496624 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34966242012-11-19 Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel Alachiotis, Nikolaos Berger, Simon A Stamatakis, Alexandros BMC Bioinformatics Research Article BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms. BioMed Central 2012-08-09 /pmc/articles/PMC3496624/ /pubmed/22876807 http://dx.doi.org/10.1186/1471-2105-13-196 Text en Copyright ©2012 Alachiotis et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Alachiotis, Nikolaos Berger, Simon A Stamatakis, Alexandros Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel |
title | Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel |
title_full | Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel |
title_fullStr | Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel |
title_full_unstemmed | Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel |
title_short | Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel |
title_sort | coupling simd and simt architectures to boost performance of a phylogeny-aware alignment kernel |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3496624/ https://www.ncbi.nlm.nih.gov/pubmed/22876807 http://dx.doi.org/10.1186/1471-2105-13-196 |
work_keys_str_mv | AT alachiotisnikolaos couplingsimdandsimtarchitecturestoboostperformanceofaphylogenyawarealignmentkernel AT bergersimona couplingsimdandsimtarchitecturestoboostperformanceofaphylogenyawarealignmentkernel AT stamatakisalexandros couplingsimdandsimtarchitecturestoboostperformanceofaphylogenyawarealignmentkernel |