Cargando…

Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization

Since the introduction of the MapReduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process Terabtes or even Petabytes of data in a single execution. The same tendency is observed at CER...

Descripción completa

Detalles Bibliográficos
Autores principales: Boychenko, Serhiy, Galilee, Marc-Antoine, Garnier, Jean-Christophe, Zenha-Rela, Mario, Zerlauth, Markus
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:https://dx.doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036
http://cds.cern.ch/record/2305508
_version_ 1780957561352093696
author Boychenko, Serhiy
Galilee, Marc-Antoine
Garnier, Jean-Christophe
Zenha-Rela, Mario
Zerlauth, Markus
author_facet Boychenko, Serhiy
Galilee, Marc-Antoine
Garnier, Jean-Christophe
Zenha-Rela, Mario
Zerlauth, Markus
author_sort Boychenko, Serhiy
collection CERN
description Since the introduction of the MapReduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process Terabtes or even Petabytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.
id oai-inspirehep.net-1656374
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling oai-inspirehep.net-16563742019-09-30T06:29:59Zdoi:10.18429/JACoW-ICALEPCS2017-THPHA036http://cds.cern.ch/record/2305508engBoychenko, SerhiyGalilee, Marc-AntoineGarnier, Jean-ChristopheZenha-Rela, MarioZerlauth, MarkusMulti-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimizationAccelerators and Storage RingsSince the introduction of the MapReduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process Terabtes or even Petabytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.oai:inspirehep.net:16563742018
spellingShingle Accelerators and Storage Rings
Boychenko, Serhiy
Galilee, Marc-Antoine
Garnier, Jean-Christophe
Zenha-Rela, Mario
Zerlauth, Markus
Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
title Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
title_full Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
title_fullStr Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
title_full_unstemmed Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
title_short Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
title_sort multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
topic Accelerators and Storage Rings
url https://dx.doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036
http://cds.cern.ch/record/2305508
work_keys_str_mv AT boychenkoserhiy multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization
AT galileemarcantoine multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization
AT garnierjeanchristophe multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization
AT zenharelamario multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization
AT zerlauthmarkus multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization