Cargando…

MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce

Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing....

Descripción completa

Detalles Bibliográficos
Autores principales:	Idris, Muhammad, Hussain, Shujaat, Siddiqi, Muhammad Hameed, Hassan, Waseem, Syed Muhammad Bilal, Hafiz, Lee, Sungyoung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4549337/ https://www.ncbi.nlm.nih.gov/pubmed/26305223 http://dx.doi.org/10.1371/journal.pone.0136259

_version_	1782387305325002752
author	Idris, Muhammad Hussain, Shujaat Siddiqi, Muhammad Hameed Hassan, Waseem Syed Muhammad Bilal, Hafiz Lee, Sungyoung
author_facet	Idris, Muhammad Hussain, Shujaat Siddiqi, Muhammad Hameed Hassan, Waseem Syed Muhammad Bilal, Hafiz Lee, Sungyoung
author_sort	Idris, Muhammad
collection	PubMed
description	Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.
format	Online Article Text
id	pubmed-4549337
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45493372015-09-01 MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce Idris, Muhammad Hussain, Shujaat Siddiqi, Muhammad Hameed Hassan, Waseem Syed Muhammad Bilal, Hafiz Lee, Sungyoung PLoS One Research Article Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement. Public Library of Science 2015-08-25 /pmc/articles/PMC4549337/ /pubmed/26305223 http://dx.doi.org/10.1371/journal.pone.0136259 Text en © 2015 Idris et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Idris, Muhammad Hussain, Shujaat Siddiqi, Muhammad Hameed Hassan, Waseem Syed Muhammad Bilal, Hafiz Lee, Sungyoung MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
title	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
title_full	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
title_fullStr	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
title_full_unstemmed	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
title_short	MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
title_sort	mrpack: multi-algorithm execution using compute-intensive approach in mapreduce
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4549337/ https://www.ncbi.nlm.nih.gov/pubmed/26305223 http://dx.doi.org/10.1371/journal.pone.0136259
work_keys_str_mv	AT idrismuhammad mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT hussainshujaat mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT siddiqimuhammadhameed mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT hassanwaseem mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT syedmuhammadbilalhafiz mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce AT leesungyoung mrpackmultialgorithmexecutionusingcomputeintensiveapproachinmapreduce

MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce

Ejemplares similares