Cargando…
Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization
Since the introduction of the MapReduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process Terabtes or even Petabytes of data in a single execution. The same tendency is observed at CER...
Autores principales: | , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036 http://cds.cern.ch/record/2305508 |
_version_ | 1780957561352093696 |
---|---|
author | Boychenko, Serhiy Galilee, Marc-Antoine Garnier, Jean-Christophe Zenha-Rela, Mario Zerlauth, Markus |
author_facet | Boychenko, Serhiy Galilee, Marc-Antoine Garnier, Jean-Christophe Zenha-Rela, Mario Zerlauth, Markus |
author_sort | Boychenko, Serhiy |
collection | CERN |
description | Since the introduction of the MapReduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process Terabtes or even Petabytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains. |
id | oai-inspirehep.net-1656374 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2018 |
record_format | invenio |
spelling | oai-inspirehep.net-16563742019-09-30T06:29:59Zdoi:10.18429/JACoW-ICALEPCS2017-THPHA036http://cds.cern.ch/record/2305508engBoychenko, SerhiyGalilee, Marc-AntoineGarnier, Jean-ChristopheZenha-Rela, MarioZerlauth, MarkusMulti-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimizationAccelerators and Storage RingsSince the introduction of the MapReduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process Terabtes or even Petabytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.oai:inspirehep.net:16563742018 |
spellingShingle | Accelerators and Storage Rings Boychenko, Serhiy Galilee, Marc-Antoine Garnier, Jean-Christophe Zenha-Rela, Mario Zerlauth, Markus Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
title | Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
title_full | Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
title_fullStr | Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
title_full_unstemmed | Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
title_short | Multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
title_sort | multi-criteria partitioning on distributed file systems for efficient accelerator data analysis and performance optimization |
topic | Accelerators and Storage Rings |
url | https://dx.doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036 http://cds.cern.ch/record/2305508 |
work_keys_str_mv | AT boychenkoserhiy multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization AT galileemarcantoine multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization AT garnierjeanchristophe multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization AT zenharelamario multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization AT zerlauthmarkus multicriteriapartitioningondistributedfilesystemsforefficientacceleratordataanalysisandperformanceoptimization |