Cargando…
Big Data Workflows: Locality-Aware Orchestration Using Software Containers
The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among re...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8706844/ https://www.ncbi.nlm.nih.gov/pubmed/34960302 http://dx.doi.org/10.3390/s21248212 |
_version_ | 1784622291036930048 |
---|---|
author | Corodescu, Andrei-Alin Nikolov, Nikolay Khan, Akif Quddus Soylu, Ahmet Matskin, Mihhail Payberah, Amir H. Roman, Dumitru |
author_facet | Corodescu, Andrei-Alin Nikolov, Nikolay Khan, Akif Quddus Soylu, Ahmet Matskin, Mihhail Payberah, Amir H. Roman, Dumitru |
author_sort | Corodescu, Andrei-Alin |
collection | PubMed |
description | The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution. |
format | Online Article Text |
id | pubmed-8706844 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-87068442021-12-25 Big Data Workflows: Locality-Aware Orchestration Using Software Containers Corodescu, Andrei-Alin Nikolov, Nikolay Khan, Akif Quddus Soylu, Ahmet Matskin, Mihhail Payberah, Amir H. Roman, Dumitru Sensors (Basel) Article The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution. MDPI 2021-12-08 /pmc/articles/PMC8706844/ /pubmed/34960302 http://dx.doi.org/10.3390/s21248212 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Corodescu, Andrei-Alin Nikolov, Nikolay Khan, Akif Quddus Soylu, Ahmet Matskin, Mihhail Payberah, Amir H. Roman, Dumitru Big Data Workflows: Locality-Aware Orchestration Using Software Containers |
title | Big Data Workflows: Locality-Aware Orchestration Using Software Containers |
title_full | Big Data Workflows: Locality-Aware Orchestration Using Software Containers |
title_fullStr | Big Data Workflows: Locality-Aware Orchestration Using Software Containers |
title_full_unstemmed | Big Data Workflows: Locality-Aware Orchestration Using Software Containers |
title_short | Big Data Workflows: Locality-Aware Orchestration Using Software Containers |
title_sort | big data workflows: locality-aware orchestration using software containers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8706844/ https://www.ncbi.nlm.nih.gov/pubmed/34960302 http://dx.doi.org/10.3390/s21248212 |
work_keys_str_mv | AT corodescuandreialin bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers AT nikolovnikolay bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers AT khanakifquddus bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers AT soyluahmet bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers AT matskinmihhail bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers AT payberahamirh bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers AT romandumitru bigdataworkflowslocalityawareorchestrationusingsoftwarecontainers |