Cargando…

Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLA...

Descripción completa

Detalles Bibliográficos
Autores principales: Yim, Won Cheol, Cushman, John C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483034/
https://www.ncbi.nlm.nih.gov/pubmed/28652936
http://dx.doi.org/10.7717/peerj.3486
_version_ 1783245680985243648
author Yim, Won Cheol
Cushman, John C.
author_facet Yim, Won Cheol
Cushman, John C.
author_sort Yim, Won Cheol
collection PubMed
description Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.
format Online
Article
Text
id pubmed-5483034
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-54830342017-06-26 Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments Yim, Won Cheol Cushman, John C. PeerJ Agricultural Science Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets. PeerJ Inc. 2017-06-22 /pmc/articles/PMC5483034/ /pubmed/28652936 http://dx.doi.org/10.7717/peerj.3486 Text en ©2017 Yim and Cushman http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Agricultural Science
Yim, Won Cheol
Cushman, John C.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
title Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
title_full Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
title_fullStr Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
title_full_unstemmed Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
title_short Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
title_sort divide and conquer (dc) blast: fast and easy blast execution within hpc environments
topic Agricultural Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483034/
https://www.ncbi.nlm.nih.gov/pubmed/28652936
http://dx.doi.org/10.7717/peerj.3486
work_keys_str_mv AT yimwoncheol divideandconquerdcblastfastandeasyblastexecutionwithinhpcenvironments
AT cushmanjohnc divideandconquerdcblastfastandeasyblastexecutionwithinhpcenvironments