Cargando…

Performance optimisations for distributed analysis in ALICE

Performance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and effi...

Descripción completa

Detalles Bibliográficos
Autores principales: Betev, L, Gheata, A, Gheata, M, Grigoras, C, Hristov, P
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/523/1/012014
http://cds.cern.ch/record/2026283
_version_ 1780947338950344704
author Betev, L
Gheata, A
Gheata, M
Grigoras, C
Hristov, P
author_facet Betev, L
Gheata, A
Gheata, M
Grigoras, C
Hristov, P
author_sort Betev, L
collection CERN
description Performance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and efficiency optimization matrix. During the last 2 years, ALICE analysis shifted from a fast development phase to the more mature and stable code. At the same time, the framewo rks and tools for deployment, monitoring and management of large productions have evolved considerably too. The ALICE Grid production system is currently used by a fair share of organized and individual user analysis, consuming up to 30% or the available r esources and ranging from fully I/O - bound analysis code to CPU intensive correlations or resonances studies. While the intrinsic analysis performance is unlikely to improve by a large factor during the LHC long shutdown (LS1), the overall efficiency of the system has still to be improved by an important factor to satisfy the analysis needs. We have instrumented all analysis jobs with "sensors" collecting comprehensive monitoring information on the job running conditions and performance in order to identify bottlenecks in the data processing flow. This data are collected by the MonALISa - based ALICE Grid monitoring system and are used to steer and improve the job submission and management policy, to identify operational problems in real time and to perform aut omatic corrective actions. In parallel with an upgrade of our production system we are aiming for low level improvements related to data format, data management and merging of results to allow for a better performing ALICE analysis
id oai-inspirehep.net-1299891
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
record_format invenio
spelling oai-inspirehep.net-12998912022-08-17T13:29:03Zdoi:10.1088/1742-6596/523/1/012014http://cds.cern.ch/record/2026283engBetev, LGheata, AGheata, MGrigoras, CHristov, PPerformance optimisations for distributed analysis in ALICEComputing and ComputersPerformance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and efficiency optimization matrix. During the last 2 years, ALICE analysis shifted from a fast development phase to the more mature and stable code. At the same time, the framewo rks and tools for deployment, monitoring and management of large productions have evolved considerably too. The ALICE Grid production system is currently used by a fair share of organized and individual user analysis, consuming up to 30% or the available r esources and ranging from fully I/O - bound analysis code to CPU intensive correlations or resonances studies. While the intrinsic analysis performance is unlikely to improve by a large factor during the LHC long shutdown (LS1), the overall efficiency of the system has still to be improved by an important factor to satisfy the analysis needs. We have instrumented all analysis jobs with "sensors" collecting comprehensive monitoring information on the job running conditions and performance in order to identify bottlenecks in the data processing flow. This data are collected by the MonALISa - based ALICE Grid monitoring system and are used to steer and improve the job submission and management policy, to identify operational problems in real time and to perform aut omatic corrective actions. In parallel with an upgrade of our production system we are aiming for low level improvements related to data format, data management and merging of results to allow for a better performing ALICE analysisoai:inspirehep.net:12998912014
spellingShingle Computing and Computers
Betev, L
Gheata, A
Gheata, M
Grigoras, C
Hristov, P
Performance optimisations for distributed analysis in ALICE
title Performance optimisations for distributed analysis in ALICE
title_full Performance optimisations for distributed analysis in ALICE
title_fullStr Performance optimisations for distributed analysis in ALICE
title_full_unstemmed Performance optimisations for distributed analysis in ALICE
title_short Performance optimisations for distributed analysis in ALICE
title_sort performance optimisations for distributed analysis in alice
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/523/1/012014
http://cds.cern.ch/record/2026283
work_keys_str_mv AT betevl performanceoptimisationsfordistributedanalysisinalice
AT gheataa performanceoptimisationsfordistributedanalysisinalice
AT gheatam performanceoptimisationsfordistributedanalysisinalice
AT grigorasc performanceoptimisationsfordistributedanalysisinalice
AT hristovp performanceoptimisationsfordistributedanalysisinalice