Cargando…
SparkBLAST: scalable BLAST processing using in-memory operations
BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5488373/ https://www.ncbi.nlm.nih.gov/pubmed/28655296 http://dx.doi.org/10.1186/s12859-017-1723-8 |
_version_ | 1783246639995027456 |
---|---|
author | de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B. |
author_facet | de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B. |
author_sort | de Castro, Marcelo Rodrigo |
collection | PubMed |
description | BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. RESULTS: Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. CONCLUSIONS: The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1723-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5488373 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54883732017-07-03 SparkBLAST: scalable BLAST processing using in-memory operations de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B. BMC Bioinformatics Software BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. RESULTS: Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. CONCLUSIONS: The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1723-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-27 /pmc/articles/PMC5488373/ /pubmed/28655296 http://dx.doi.org/10.1186/s12859-017-1723-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B. SparkBLAST: scalable BLAST processing using in-memory operations |
title | SparkBLAST: scalable BLAST processing using in-memory operations |
title_full | SparkBLAST: scalable BLAST processing using in-memory operations |
title_fullStr | SparkBLAST: scalable BLAST processing using in-memory operations |
title_full_unstemmed | SparkBLAST: scalable BLAST processing using in-memory operations |
title_short | SparkBLAST: scalable BLAST processing using in-memory operations |
title_sort | sparkblast: scalable blast processing using in-memory operations |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5488373/ https://www.ncbi.nlm.nih.gov/pubmed/28655296 http://dx.doi.org/10.1186/s12859-017-1723-8 |
work_keys_str_mv | AT decastromarcelorodrigo sparkblastscalableblastprocessingusinginmemoryoperations AT tostescatherinedossantos sparkblastscalableblastprocessingusinginmemoryoperations AT davilaalbertomr sparkblastscalableblastprocessingusinginmemoryoperations AT sengerhermes sparkblastscalableblastprocessingusinginmemoryoperations AT dasilvafabricioab sparkblastscalableblastprocessingusinginmemoryoperations |