Cargando…

Distributing and storing required data efficiently by means of specifically tailored data formats in the ATLAS collaboration

With the start of the LHC physics program, the ATLAS experiment started to record vast amounts of data. This data has to be distributed and stored on the world-wide computing grid in a smart way in order to enable an e®ective and e±cient analysis by physicists. There are in principle two classes of...

Descripción completa

Detalles Bibliográficos
Autor principal: Köneke, K
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:http://cds.cern.ch/record/1300751
Descripción
Sumario:With the start of the LHC physics program, the ATLAS experiment started to record vast amounts of data. This data has to be distributed and stored on the world-wide computing grid in a smart way in order to enable an e®ective and e±cient analysis by physicists. There are in principle two classes of analysis that are required. In the commissioning phase of the ATLAS experiment, low{level Event Sum- mary Data (ESD), the result of the event reconstruction, has to be an- alyzed to evaluate the performance of the individual subdetectors, the performance of the reconstruction and particle identi¯cation algorithms, and to obtain calibration coe±cients. For later physics analysis, it is usu- ally su±cient to use the less detailed Analysis Object Data (AOD), which is a less-detailed version of the ESD. In the grid model of distributed analysis, these data must be trans- ferred to Tier-2 sites before they can be analyzed. However, the large size of ESD ('1 MByte/event) constrains the amount of detailed data that can be distributed on the grid and is available on disks. In order to over- come this constraint and make the data fully available, new data sets | collectively known as Derived ESD (DESD) | have been designed. Each DESD set contains a subset of the ESD data, tailored to speci¯c needs of the subdetector and object reconstruction and identi¯cation performance groups. Filtering algorithms perform a selection based on physics con- tents and trigger response, further reducing the data volume. Thanks to these techniques, the total volume of DESD to be distributed on the grid amounts to 20% of the initial ESD data. Contrary to the ESD, the full AOD is distributed to the Tier-2s. However, the vast number of events and the still sizable volume ('100 kByte/event) render physics analyzes directly on these datasets very inef- ¯cient. Several classes of event signatures have been identi¯ed and based on these, several Derived AODs (DAODs) are produced. An event se- lection is performed based on physics contents and trigger response, thus reducing the data volume of each of these new DAODs to about 1% of the initial AOD and thus reducing the required processing time for a given analysis by a factor 100. In all cases, the selection criteria and other relevant information are stored inside the DESDs and DAODs as meta-data and a connection to external databases is also established.