Cargando…

Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment be...

Descripción completa

Detalles Bibliográficos
Autores principales: Shrimankar, D. D., Sathe, S. R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135121/
https://www.ncbi.nlm.nih.gov/pubmed/27932868
http://dx.doi.org/10.4137/BBI.S40744
_version_ 1782471571326107648
author Shrimankar, D. D.
Sathe, S. R.
author_facet Shrimankar, D. D.
Sathe, S. R.
author_sort Shrimankar, D. D.
collection PubMed
description Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
format Online
Article
Text
id pubmed-5135121
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-51351212016-12-08 Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets Shrimankar, D. D. Sathe, S. R. Bioinform Biol Insights Methodology Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. Libertas Academica 2016-11-30 /pmc/articles/PMC5135121/ /pubmed/27932868 http://dx.doi.org/10.4137/BBI.S40744 Text en © 2016 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Methodology
Shrimankar, D. D.
Sathe, S. R.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets
title Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets
title_full Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets
title_fullStr Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets
title_full_unstemmed Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets
title_short Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets
title_sort analysis of parallel algorithms on smp node and cluster of workstations using parallel programming models with new tile-based method for large biological datasets
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135121/
https://www.ncbi.nlm.nih.gov/pubmed/27932868
http://dx.doi.org/10.4137/BBI.S40744
work_keys_str_mv AT shrimankardd analysisofparallelalgorithmsonsmpnodeandclusterofworkstationsusingparallelprogrammingmodelswithnewtilebasedmethodforlargebiologicaldatasets
AT sathesr analysisofparallelalgorithmsonsmpnodeandclusterofworkstationsusingparallelprogrammingmodelswithnewtilebasedmethodforlargebiologicaldatasets