Cargando…

SparkBLAST: scalable BLAST processing using in-memory operations

BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud...

Descripción completa

Detalles Bibliográficos
Autores principales: de Castro, Marcelo Rodrigo, Tostes, Catherine dos Santos, Dávila, Alberto M. R., Senger, Hermes, da Silva, Fabricio A. B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5488373/
https://www.ncbi.nlm.nih.gov/pubmed/28655296
http://dx.doi.org/10.1186/s12859-017-1723-8
Descripción
Sumario:BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. RESULTS: Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. CONCLUSIONS: The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1723-8) contains supplementary material, which is available to authorized users.