Cargando…

Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters

BACKGROUND: Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. RESULTS: This...

Descripción completa

Detalles Bibliográficos
Autores principales: Lan, Haidong, Chan, Yuandong, Xu, Kai, Schmidt, Bertil, Peng, Shaoliang, Liu, Weiguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959381/
https://www.ncbi.nlm.nih.gov/pubmed/27455061
http://dx.doi.org/10.1186/s12859-016-1128-0
_version_ 1782444395256086528
author Lan, Haidong
Chan, Yuandong
Xu, Kai
Schmidt, Bertil
Peng, Shaoliang
Liu, Weiguo
author_facet Lan, Haidong
Chan, Yuandong
Xu, Kai
Schmidt, Bertil
Peng, Shaoliang
Liu, Weiguo
author_sort Lan, Haidong
collection PubMed
description BACKGROUND: Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. RESULTS: This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. CONCLUSIONS: Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi.
format Online
Article
Text
id pubmed-4959381
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49593812016-08-01 Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters Lan, Haidong Chan, Yuandong Xu, Kai Schmidt, Bertil Peng, Shaoliang Liu, Weiguo BMC Bioinformatics Research BACKGROUND: Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. RESULTS: This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. CONCLUSIONS: Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi. BioMed Central 2016-07-19 /pmc/articles/PMC4959381/ /pubmed/27455061 http://dx.doi.org/10.1186/s12859-016-1128-0 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lan, Haidong
Chan, Yuandong
Xu, Kai
Schmidt, Bertil
Peng, Shaoliang
Liu, Weiguo
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
title Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
title_full Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
title_fullStr Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
title_full_unstemmed Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
title_short Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
title_sort parallel algorithms for large-scale biological sequence alignment on xeon-phi based clusters
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959381/
https://www.ncbi.nlm.nih.gov/pubmed/27455061
http://dx.doi.org/10.1186/s12859-016-1128-0
work_keys_str_mv AT lanhaidong parallelalgorithmsforlargescalebiologicalsequencealignmentonxeonphibasedclusters
AT chanyuandong parallelalgorithmsforlargescalebiologicalsequencealignmentonxeonphibasedclusters
AT xukai parallelalgorithmsforlargescalebiologicalsequencealignmentonxeonphibasedclusters
AT schmidtbertil parallelalgorithmsforlargescalebiologicalsequencealignmentonxeonphibasedclusters
AT pengshaoliang parallelalgorithmsforlargescalebiologicalsequencealignmentonxeonphibasedclusters
AT liuweiguo parallelalgorithmsforlargescalebiologicalsequencealignmentonxeonphibasedclusters