Cargando…

MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput

BACKGROUND: A scheduling algorithm tries to schedule multiple computational tasks on a cluster of multiple computing nodes to maximize throughput with optimal utilization of computational and communicational resources. A Stream Processing Engine (SPE) is deployed to run streaming applications (compu...

Descripción completa

Detalles Bibliográficos
Autores principales: Muhammad, Asif, Abdul Qadir, Muhammad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575846/
https://www.ncbi.nlm.nih.gov/pubmed/36262141
http://dx.doi.org/10.7717/peerj-cs.1077
_version_ 1784811401567535104
author Muhammad, Asif
Abdul Qadir, Muhammad
author_facet Muhammad, Asif
Abdul Qadir, Muhammad
author_sort Muhammad, Asif
collection PubMed
description BACKGROUND: A scheduling algorithm tries to schedule multiple computational tasks on a cluster of multiple computing nodes to maximize throughput with optimal utilization of computational and communicational resources. A Stream Processing Engine (SPE) is deployed to run streaming applications (computational tasks) on a computational cluster which helps execution and coordination of these applications. It is observed that there is a gap in the optimal mapping of a computational and communicational load of a streaming application on the underlying computational and communication power of the resources (cluster). Frequently communicated tasks are scheduled at different processing nodes with relatively slow communicating links. This increases network latency with a decrease in resource utilization. Hence, reduces the achieved throughput of the cluster significantly. METHODS: MF-Storm, a max-flow min-cut based job scheduler is presented to achieve a near-optimum schedule to maximize throughput. It schedules a streaming application by considering the processing, communication demands, available computational and communicational resources in a heterogeneous cluster, dynamically with minimized scheduling cost. To keep the scheduling cost minimum, the scheduler is built in a pipeline with two major stages: in the first stage, the application’s tasks graph is partitioned using the max-flow min-cut algorithm to minimize inter-partition traffic, and in the second stage, these partitions are assigned to computing nodes according to the computational power of the cluster’s nodes. RESULTS: Extensive experiments were done to evaluate the performance of MF-Storm using different topologies with multiple scenarios on a physical cluster implementation. Results showed on average 148% improvement in throughput with 30% less computational resources as compared to different state-of-the-art schedulers.
format Online
Article
Text
id pubmed-9575846
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-95758462022-10-18 MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput Muhammad, Asif Abdul Qadir, Muhammad PeerJ Comput Sci Algorithms and Analysis of Algorithms BACKGROUND: A scheduling algorithm tries to schedule multiple computational tasks on a cluster of multiple computing nodes to maximize throughput with optimal utilization of computational and communicational resources. A Stream Processing Engine (SPE) is deployed to run streaming applications (computational tasks) on a computational cluster which helps execution and coordination of these applications. It is observed that there is a gap in the optimal mapping of a computational and communicational load of a streaming application on the underlying computational and communication power of the resources (cluster). Frequently communicated tasks are scheduled at different processing nodes with relatively slow communicating links. This increases network latency with a decrease in resource utilization. Hence, reduces the achieved throughput of the cluster significantly. METHODS: MF-Storm, a max-flow min-cut based job scheduler is presented to achieve a near-optimum schedule to maximize throughput. It schedules a streaming application by considering the processing, communication demands, available computational and communicational resources in a heterogeneous cluster, dynamically with minimized scheduling cost. To keep the scheduling cost minimum, the scheduler is built in a pipeline with two major stages: in the first stage, the application’s tasks graph is partitioned using the max-flow min-cut algorithm to minimize inter-partition traffic, and in the second stage, these partitions are assigned to computing nodes according to the computational power of the cluster’s nodes. RESULTS: Extensive experiments were done to evaluate the performance of MF-Storm using different topologies with multiple scenarios on a physical cluster implementation. Results showed on average 148% improvement in throughput with 30% less computational resources as compared to different state-of-the-art schedulers. PeerJ Inc. 2022-09-26 /pmc/articles/PMC9575846/ /pubmed/36262141 http://dx.doi.org/10.7717/peerj-cs.1077 Text en © 2022 Muhammad and Abdul Qadir https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Algorithms and Analysis of Algorithms
Muhammad, Asif
Abdul Qadir, Muhammad
MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
title MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
title_full MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
title_fullStr MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
title_full_unstemmed MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
title_short MF-Storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
title_sort mf-storm: a maximum flow-based job scheduler for stream processing engines on computational clusters to increase throughput
topic Algorithms and Analysis of Algorithms
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575846/
https://www.ncbi.nlm.nih.gov/pubmed/36262141
http://dx.doi.org/10.7717/peerj-cs.1077
work_keys_str_mv AT muhammadasif mfstormamaximumflowbasedjobschedulerforstreamprocessingenginesoncomputationalclusterstoincreasethroughput
AT abdulqadirmuhammad mfstormamaximumflowbasedjobschedulerforstreamprocessingenginesoncomputationalclusterstoincreasethroughput