Cargando…

SparkBLAST: scalable BLAST processing using in-memory operations

BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud...

Descripción completa

Detalles Bibliográficos
Autores principales:	de Castro, Marcelo Rodrigo, Tostes, Catherine dos Santos, Dávila, Alberto M. R., Senger, Hermes, da Silva, Fabricio A. B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5488373/ https://www.ncbi.nlm.nih.gov/pubmed/28655296 http://dx.doi.org/10.1186/s12859-017-1723-8

_version_	1783246639995027456
author	de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B.
author_facet	de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B.
author_sort	de Castro, Marcelo Rodrigo
collection	PubMed
description	BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. RESULTS: Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. CONCLUSIONS: The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1723-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5488373
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54883732017-07-03 SparkBLAST: scalable BLAST processing using in-memory operations de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B. BMC Bioinformatics Software BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. RESULTS: Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. CONCLUSIONS: The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1723-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-27 /pmc/articles/PMC5488373/ /pubmed/28655296 http://dx.doi.org/10.1186/s12859-017-1723-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software de Castro, Marcelo Rodrigo Tostes, Catherine dos Santos Dávila, Alberto M. R. Senger, Hermes da Silva, Fabricio A. B. SparkBLAST: scalable BLAST processing using in-memory operations
title	SparkBLAST: scalable BLAST processing using in-memory operations
title_full	SparkBLAST: scalable BLAST processing using in-memory operations
title_fullStr	SparkBLAST: scalable BLAST processing using in-memory operations
title_full_unstemmed	SparkBLAST: scalable BLAST processing using in-memory operations
title_short	SparkBLAST: scalable BLAST processing using in-memory operations
title_sort	sparkblast: scalable blast processing using in-memory operations
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5488373/ https://www.ncbi.nlm.nih.gov/pubmed/28655296 http://dx.doi.org/10.1186/s12859-017-1723-8
work_keys_str_mv	AT decastromarcelorodrigo sparkblastscalableblastprocessingusinginmemoryoperations AT tostescatherinedossantos sparkblastscalableblastprocessingusinginmemoryoperations AT davilaalbertomr sparkblastscalableblastprocessingusinginmemoryoperations AT sengerhermes sparkblastscalableblastprocessingusinginmemoryoperations AT dasilvafabricioab sparkblastscalableblastprocessingusinginmemoryoperations

SparkBLAST: scalable BLAST processing using in-memory operations

Ejemplares similares