Cargando…

Big Data Workflows: Locality-Aware Orchestration Using Software Containers

The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among re...

Descripción completa

Detalles Bibliográficos
Autores principales: Corodescu, Andrei-Alin, Nikolov, Nikolay, Khan, Akif Quddus, Soylu, Ahmet, Matskin, Mihhail, Payberah, Amir H., Roman, Dumitru
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8706844/
https://www.ncbi.nlm.nih.gov/pubmed/34960302
http://dx.doi.org/10.3390/s21248212
_version_ 1784622291036930048
author Corodescu, Andrei-Alin
Nikolov, Nikolay
Khan, Akif Quddus
Soylu, Ahmet
Matskin, Mihhail
Payberah, Amir H.
Roman, Dumitru
author_facet Corodescu, Andrei-Alin
Nikolov, Nikolay
Khan, Akif Quddus
Soylu, Ahmet
Matskin, Mihhail
Payberah, Amir H.
Roman, Dumitru
author_sort Corodescu, Andrei-Alin
collection PubMed
description The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.
format Online
Article
Text
id pubmed-8706844
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87068442021-12-25 Big Data Workflows: Locality-Aware Orchestration Using Software Containers Corodescu, Andrei-Alin Nikolov, Nikolay Khan, Akif Quddus Soylu, Ahmet Matskin, Mihhail Payberah, Amir H. Roman, Dumitru Sensors (Basel) Article The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution. MDPI 2021-12-08 /pmc/articles/PMC8706844/ /pubmed/34960302 http://dx.doi.org/10.3390/s21248212 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Corodescu, Andrei-Alin
Nikolov, Nikolay
Khan, Akif Quddus
Soylu, Ahmet
Matskin, Mihhail
Payberah, Amir H.
Roman, Dumitru
Big Data Workflows: Locality-Aware Orchestration Using Software Containers
title Big Data Workflows: Locality-Aware Orchestration Using Software Containers
title_full Big Data Workflows: Locality-Aware Orchestration Using Software Containers
title_fullStr Big Data Workflows: Locality-Aware Orchestration Using Software Containers
title_full_unstemmed Big Data Workflows: Locality-Aware Orchestration Using Software Containers
title_short Big Data Workflows: Locality-Aware Orchestration Using Software Containers
title_sort big data workflows: locality-aware orchestration using software containers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8706844/
https://www.ncbi.nlm.nih.gov/pubmed/34960302
http://dx.doi.org/10.3390/s21248212
work_keys_str_mv AT corodescuandreialin bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers
AT nikolovnikolay bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers
AT khanakifquddus bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers
AT soyluahmet bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers
AT matskinmihhail bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers
AT payberahamirh bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers
AT romandumitru bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers