Cargando…

Performance optimisations for distributed analysis in ALICE

Performance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and effi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Betev, L, Gheata, A, Gheata, M, Grigoras, C, Hristov, P
Lenguaje:	eng
Publicado:	2014
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1088/1742-6596/523/1/012014 http://cds.cern.ch/record/2026283

_version_	1780947338950344704
author	Betev, L Gheata, A Gheata, M Grigoras, C Hristov, P
author_facet	Betev, L Gheata, A Gheata, M Grigoras, C Hristov, P
author_sort	Betev, L
collection	CERN
description	Performance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and efficiency optimization matrix. During the last 2 years, ALICE analysis shifted from a fast development phase to the more mature and stable code. At the same time, the framewo rks and tools for deployment, monitoring and management of large productions have evolved considerably too. The ALICE Grid production system is currently used by a fair share of organized and individual user analysis, consuming up to 30% or the available r esources and ranging from fully I/O - bound analysis code to CPU intensive correlations or resonances studies. While the intrinsic analysis performance is unlikely to improve by a large factor during the LHC long shutdown (LS1), the overall efficiency of the system has still to be improved by an important factor to satisfy the analysis needs. We have instrumented all analysis jobs with "sensors" collecting comprehensive monitoring information on the job running conditions and performance in order to identify bottlenecks in the data processing flow. This data are collected by the MonALISa - based ALICE Grid monitoring system and are used to steer and improve the job submission and management policy, to identify operational problems in real time and to perform aut omatic corrective actions. In parallel with an upgrade of our production system we are aiming for low level improvements related to data format, data management and merging of results to allow for a better performing ALICE analysis
id	oai-inspirehep.net-1299891
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2014
record_format	invenio
spelling	oai-inspirehep.net-12998912022-08-17T13:29:03Zdoi:10.1088/1742-6596/523/1/012014http://cds.cern.ch/record/2026283engBetev, LGheata, AGheata, MGrigoras, CHristov, PPerformance optimisations for distributed analysis in ALICEComputing and ComputersPerformance is a critical issue in a production system accommodating hundreds of analysis users. Compared to a local session, distributed analysis is exposed to services and network latencies, remote data access and heterogeneous computing infrastructure, creating a more complex performance and efficiency optimization matrix. During the last 2 years, ALICE analysis shifted from a fast development phase to the more mature and stable code. At the same time, the framewo rks and tools for deployment, monitoring and management of large productions have evolved considerably too. The ALICE Grid production system is currently used by a fair share of organized and individual user analysis, consuming up to 30% or the available r esources and ranging from fully I/O - bound analysis code to CPU intensive correlations or resonances studies. While the intrinsic analysis performance is unlikely to improve by a large factor during the LHC long shutdown (LS1), the overall efficiency of the system has still to be improved by an important factor to satisfy the analysis needs. We have instrumented all analysis jobs with "sensors" collecting comprehensive monitoring information on the job running conditions and performance in order to identify bottlenecks in the data processing flow. This data are collected by the MonALISa - based ALICE Grid monitoring system and are used to steer and improve the job submission and management policy, to identify operational problems in real time and to perform aut omatic corrective actions. In parallel with an upgrade of our production system we are aiming for low level improvements related to data format, data management and merging of results to allow for a better performing ALICE analysisoai:inspirehep.net:12998912014
spellingShingle	Computing and Computers Betev, L Gheata, A Gheata, M Grigoras, C Hristov, P Performance optimisations for distributed analysis in ALICE
title	Performance optimisations for distributed analysis in ALICE
title_full	Performance optimisations for distributed analysis in ALICE
title_fullStr	Performance optimisations for distributed analysis in ALICE
title_full_unstemmed	Performance optimisations for distributed analysis in ALICE
title_short	Performance optimisations for distributed analysis in ALICE
title_sort	performance optimisations for distributed analysis in alice
topic	Computing and Computers
url	https://dx.doi.org/10.1088/1742-6596/523/1/012014 http://cds.cern.ch/record/2026283
work_keys_str_mv	AT betevl performanceoptimisationsfordistributedanalysisinalice AT gheataa performanceoptimisationsfordistributedanalysisinalice AT gheatam performanceoptimisationsfordistributedanalysisinalice AT grigorasc performanceoptimisationsfordistributedanalysisinalice AT hristovp performanceoptimisationsfordistributedanalysisinalice

Performance optimisations for distributed analysis in ALICE

Ejemplares similares