Cargando…
Performance optimisations for distributed analysis in ALICE
Performance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and effi...
Autores principales: | , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2014
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/523/1/012014 http://cds.cern.ch/record/2026283 |
_version_ | 1780947338950344704 |
---|---|
author | Betev, L Gheata, A Gheata, M Grigoras, C Hristov, P |
author_facet | Betev, L Gheata, A Gheata, M Grigoras, C Hristov, P |
author_sort | Betev, L |
collection | CERN |
description | Performance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and efficiency optimization matrix. During the last 2 years, ALICE analysis shifted from a fast development phase to the more mature and stable code. At the same time, the framewo rks and tools for deployment, monitoring and management of large productions have evolved considerably too. The ALICE Grid production system is currently used by a fair share of organized and individual user analysis, consuming up to 30% or the available r esources and ranging from fully I/O - bound analysis code to CPU intensive correlations or resonances studies. While the intrinsic analysis performance is unlikely to improve by a large factor during the LHC long shutdown (LS1), the overall efficiency of the system has still to be improved by an important factor to satisfy the analysis needs. We have instrumented all analysis jobs with "sensors" collecting comprehensive monitoring information on the job running conditions and performance in order to identify bottlenecks in the data processing flow. This data are collected by the MonALISa - based ALICE Grid monitoring system and are used to steer and improve the job submission and management policy, to identify operational problems in real time and to perform aut omatic corrective actions. In parallel with an upgrade of our production system we are aiming for low level improvements related to data format, data management and merging of results to allow for a better performing ALICE analysis |
id | oai-inspirehep.net-1299891 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2014 |
record_format | invenio |
spelling | oai-inspirehep.net-12998912022-08-17T13:29:03Zdoi:10.1088/1742-6596/523/1/012014http://cds.cern.ch/record/2026283engBetev, LGheata, AGheata, MGrigoras, CHristov, PPerformance optimisations for distributed analysis in ALICEComputing and ComputersPerformance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and efficiency optimization matrix. During the last 2 years, ALICE analysis shifted from a fast development phase to the more mature and stable code. At the same time, the framewo rks and tools for deployment, monitoring and management of large productions have evolved considerably too. The ALICE Grid production system is currently used by a fair share of organized and individual user analysis, consuming up to 30% or the available r esources and ranging from fully I/O - bound analysis code to CPU intensive correlations or resonances studies. While the intrinsic analysis performance is unlikely to improve by a large factor during the LHC long shutdown (LS1), the overall efficiency of the system has still to be improved by an important factor to satisfy the analysis needs. We have instrumented all analysis jobs with "sensors" collecting comprehensive monitoring information on the job running conditions and performance in order to identify bottlenecks in the data processing flow. This data are collected by the MonALISa - based ALICE Grid monitoring system and are used to steer and improve the job submission and management policy, to identify operational problems in real time and to perform aut omatic corrective actions. In parallel with an upgrade of our production system we are aiming for low level improvements related to data format, data management and merging of results to allow for a better performing ALICE analysisoai:inspirehep.net:12998912014 |
spellingShingle | Computing and Computers Betev, L Gheata, A Gheata, M Grigoras, C Hristov, P Performance optimisations for distributed analysis in ALICE |
title | Performance optimisations for distributed analysis in ALICE |
title_full | Performance optimisations for distributed analysis in ALICE |
title_fullStr | Performance optimisations for distributed analysis in ALICE |
title_full_unstemmed | Performance optimisations for distributed analysis in ALICE |
title_short | Performance optimisations for distributed analysis in ALICE |
title_sort | performance optimisations for distributed analysis in alice |
topic | Computing and Computers |
url | https://dx.doi.org/10.1088/1742-6596/523/1/012014 http://cds.cern.ch/record/2026283 |
work_keys_str_mv | AT betevl performanceoptimisationsfordistributedanalysisinalice AT gheataa performanceoptimisationsfordistributedanalysisinalice AT gheatam performanceoptimisationsfordistributedanalysisinalice AT grigorasc performanceoptimisationsfordistributedanalysisinalice AT hristovp performanceoptimisationsfordistributedanalysisinalice |