Cargando…

Data Sampling methods in the ALICE O$^2$ distributed processing system

The ALICE experiment at the CERN LHC focuses on studying the quark-gluon plasma produced by heavy-ion collisions. Starting from 2021, it will see its input data throughput increase a hundredfold, up to 3.5 TB/s. To cope with such a large amount of data, a new online-offline computing system, called...

Descripción completa

Detalles Bibliográficos
Autores principales:	Konopka, Piotr, von Haller, Barthélémy
Lenguaje:	eng
Publicado:	2021
Materias:	Computing and Computers Particle Physics - Experiment
Acceso en línea:	https://dx.doi.org/10.1016/j.cpc.2020.107581 http://cds.cern.ch/record/2739401

_version_	1780968170622812160
author	Konopka, Piotr von Haller, Barthélémy
author_facet	Konopka, Piotr von Haller, Barthélémy
author_sort	Konopka, Piotr
collection	CERN
description	The ALICE experiment at the CERN LHC focuses on studying the quark-gluon plasma produced by heavy-ion collisions. Starting from 2021, it will see its input data throughput increase a hundredfold, up to 3.5 TB/s. To cope with such a large amount of data, a new online-offline computing system, called O$^2$, will be deployed. It will synchronously compress the data stream by a factor of 35 down to 100 GB/s before storing it permanently. One of the key software components of the system will be the data Quality Control (QC). This framework and infrastructure is responsible for all aspects related to the analysis software aimed at identifying possible issues with the data itself, and indirectly with the underlying processing done both synchronously and asynchronously. Since analyzing the full stream of data online would exceed the available computational resources, a reliable and efficient sampling will be needed. It should provide a few percent of data selected randomly in a statistically sound manner with a minimal impact on the main dataflow. Extra requirements include e.g. the option to choose data corresponding to the same collisions over a group of computing nodes. In this paper the design of the O$^2$ Data Sampling software is presented. In particular, the requirements for pseudo-random number generators to be used for sampling decisions are highlighted, as well as the results of the benchmarks performed to evaluate different possibilities. Finally, a large scale test of the O$^2$ Data Sampling is reported.
id	oai-inspirehep.net-1818373
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2021
record_format	invenio
spelling	oai-inspirehep.net-18183732020-10-03T20:10:41Zdoi:10.1016/j.cpc.2020.107581http://cds.cern.ch/record/2739401engKonopka, Piotrvon Haller, BarthélémyData Sampling methods in the ALICE O$^2$ distributed processing systemComputing and ComputersParticle Physics - ExperimentThe ALICE experiment at the CERN LHC focuses on studying the quark-gluon plasma produced by heavy-ion collisions. Starting from 2021, it will see its input data throughput increase a hundredfold, up to 3.5 TB/s. To cope with such a large amount of data, a new online-offline computing system, called O$^2$, will be deployed. It will synchronously compress the data stream by a factor of 35 down to 100 GB/s before storing it permanently. One of the key software components of the system will be the data Quality Control (QC). This framework and infrastructure is responsible for all aspects related to the analysis software aimed at identifying possible issues with the data itself, and indirectly with the underlying processing done both synchronously and asynchronously. Since analyzing the full stream of data online would exceed the available computational resources, a reliable and efficient sampling will be needed. It should provide a few percent of data selected randomly in a statistically sound manner with a minimal impact on the main dataflow. Extra requirements include e.g. the option to choose data corresponding to the same collisions over a group of computing nodes. In this paper the design of the O$^2$ Data Sampling software is presented. In particular, the requirements for pseudo-random number generators to be used for sampling decisions are highlighted, as well as the results of the benchmarks performed to evaluate different possibilities. Finally, a large scale test of the O$^2$ Data Sampling is reported.oai:inspirehep.net:18183732021
spellingShingle	Computing and Computers Particle Physics - Experiment Konopka, Piotr von Haller, Barthélémy Data Sampling methods in the ALICE O$^2$ distributed processing system
title	Data Sampling methods in the ALICE O$^2$ distributed processing system
title_full	Data Sampling methods in the ALICE O$^2$ distributed processing system
title_fullStr	Data Sampling methods in the ALICE O$^2$ distributed processing system
title_full_unstemmed	Data Sampling methods in the ALICE O$^2$ distributed processing system
title_short	Data Sampling methods in the ALICE O$^2$ distributed processing system
title_sort	data sampling methods in the alice o$^2$ distributed processing system
topic	Computing and Computers Particle Physics - Experiment
url	https://dx.doi.org/10.1016/j.cpc.2020.107581 http://cds.cern.ch/record/2739401
work_keys_str_mv	AT konopkapiotr datasamplingmethodsinthealiceo2distributedprocessingsystem AT vonhallerbarthelemy datasamplingmethodsinthealiceo2distributedprocessingsystem

Data Sampling methods in the ALICE O$^2$ distributed processing system

Ejemplares similares