Cargando…

DDS: integrating data analytics transformations in task-based workflows

High-performance data analytics (HPDA) is a current trend in e-science research that aims to integrate traditional HPC with recent data analytic frameworks. Most of the work done in this field has focused on improving data analytic frameworks by implementing their engines on top of HPC technologies...

Descripción completa

Detalles Bibliográficos
Autores principales: Mammadli, Nihad, Ejarque, Jorge, Alvarez, Javier, Badia, Rosa M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10446079/
https://www.ncbi.nlm.nih.gov/pubmed/37645279
http://dx.doi.org/10.12688/openreseurope.14569.2
_version_ 1785094324213514240
author Mammadli, Nihad
Ejarque, Jorge
Alvarez, Javier
Badia, Rosa M.
author_facet Mammadli, Nihad
Ejarque, Jorge
Alvarez, Javier
Badia, Rosa M.
author_sort Mammadli, Nihad
collection PubMed
description High-performance data analytics (HPDA) is a current trend in e-science research that aims to integrate traditional HPC with recent data analytic frameworks. Most of the work done in this field has focused on improving data analytic frameworks by implementing their engines on top of HPC technologies such as Message Passing Interface. However, there is a lack of integration from an application development perspective. HPC workflows have their own parallel programming models, while data analytic (DA) algorithms are mainly implemented using data transformations and executed with frameworks like Spark. Task-based programming models (TBPMs) are a very efficient approach for implementing HPC workflows. Data analytic transformations can also be decomposed as a set of tasks and implemented with a task-based programming model. In this paper, we present a methodology to develop HPDA applications on top of TBPMs that allow developers to combine HPC workflows and data analytic transformations seamlessly. A prototype of this approach has been implemented on top of the PyCOMPSs task-based programming model to validate two aspects: HPDA applications can be seamlessly developed and have better performance than Spark. We compare our results using different programs. Finally, we conclude with the idea of integrating DA into HPC applications and evaluation of our method against Spark.
format Online
Article
Text
id pubmed-10446079
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-104460792023-08-29 DDS: integrating data analytics transformations in task-based workflows Mammadli, Nihad Ejarque, Jorge Alvarez, Javier Badia, Rosa M. Open Res Eur Research Article High-performance data analytics (HPDA) is a current trend in e-science research that aims to integrate traditional HPC with recent data analytic frameworks. Most of the work done in this field has focused on improving data analytic frameworks by implementing their engines on top of HPC technologies such as Message Passing Interface. However, there is a lack of integration from an application development perspective. HPC workflows have their own parallel programming models, while data analytic (DA) algorithms are mainly implemented using data transformations and executed with frameworks like Spark. Task-based programming models (TBPMs) are a very efficient approach for implementing HPC workflows. Data analytic transformations can also be decomposed as a set of tasks and implemented with a task-based programming model. In this paper, we present a methodology to develop HPDA applications on top of TBPMs that allow developers to combine HPC workflows and data analytic transformations seamlessly. A prototype of this approach has been implemented on top of the PyCOMPSs task-based programming model to validate two aspects: HPDA applications can be seamlessly developed and have better performance than Spark. We compare our results using different programs. Finally, we conclude with the idea of integrating DA into HPC applications and evaluation of our method against Spark. F1000 Research Limited 2023-04-11 /pmc/articles/PMC10446079/ /pubmed/37645279 http://dx.doi.org/10.12688/openreseurope.14569.2 Text en Copyright: © 2023 Mammadli N et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mammadli, Nihad
Ejarque, Jorge
Alvarez, Javier
Badia, Rosa M.
DDS: integrating data analytics transformations in task-based workflows
title DDS: integrating data analytics transformations in task-based workflows
title_full DDS: integrating data analytics transformations in task-based workflows
title_fullStr DDS: integrating data analytics transformations in task-based workflows
title_full_unstemmed DDS: integrating data analytics transformations in task-based workflows
title_short DDS: integrating data analytics transformations in task-based workflows
title_sort dds: integrating data analytics transformations in task-based workflows
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10446079/
https://www.ncbi.nlm.nih.gov/pubmed/37645279
http://dx.doi.org/10.12688/openreseurope.14569.2
work_keys_str_mv AT mammadlinihad ddsintegratingdataanalyticstransformationsintaskbasedworkflows
AT ejarquejorge ddsintegratingdataanalyticstransformationsintaskbasedworkflows
AT alvarezjavier ddsintegratingdataanalyticstransformationsintaskbasedworkflows
AT badiarosam ddsintegratingdataanalyticstransformationsintaskbasedworkflows