Cargando…

Atlas event production on the EGEE infrastructure

ATLAS is one of the four LHC (Large Hadron Collider) experiments at CERN, is devoted to study proton-proton and ion-ion collisions at 14TeV. ATLAS collaboration is composed of about 2000 scientists spread around the world. The activity of the experiment requirements for next year is of about 300TB o...

Descripción completa

Detalles Bibliográficos
Autores principales: Espinal, X, Campana, S, Perini, L, Rod, W
Lenguaje:eng
Publicado: 2007
Materias:
Acceso en línea:http://cds.cern.ch/record/1120790
_version_ 1780914565871042560
author Espinal, X
Campana, S
Perini, L
Rod, W
author_facet Espinal, X
Campana, S
Perini, L
Rod, W
author_sort Espinal, X
collection CERN
description ATLAS is one of the four LHC (Large Hadron Collider) experiments at CERN, is devoted to study proton-proton and ion-ion collisions at 14TeV. ATLAS collaboration is composed of about 2000 scientists spread around the world. The activity of the experiment requirements for next year is of about 300TB of storage and a CPU power of about 13 Mski2sk, and is relying on GRID philosophy and EGEE infrastructure. Simulated events are distributed over EGEE by the Atlas production system. Data has to be processed and must be accessible by a huge number of scientists for analysis. The throughput of data for Atlas experiment is expected to be of 320 MB/s with an integrated amount of data per year of ~10Pb. The processing and storage need a distributed share of resources, spread worldwide and interconnected with GRID technologies as the requirements are so demanding for the LHC. In that sense event production is the way to produce, process and store data for analysis before the experiment startup, and is performed in a distributed way. Tasks are defined by physics coordinators and then are assigned to Computing Elements spread worldwide. Some of the jobs that build up the tasks need input data as well to produce new output, this means the jobs may need input from external sites and store remotely. For that reason sites are connected by File Transfer Service (FTS) channels that links the Storage Elements (SE) interface for each site. ATLAS is using the services provided by the EGEE middleware. Event simulation jobs are sent to the LCG (LHC Computing Grid) GRID by glite-WMS (Workload Management System) and Condor-G and using the dispatching tools of the CE's. Event simulation jobs perform the Data Management as well, request the inputs and stores the outputs on the desired SE's, file location and information is managed with distributed LCG File Catalogues (LFC). On the other hand, asymmetric file movement is performed by the ATLAS specific software on Distributed Data Management (DDM), which takes care of the file movement on top of the FTS services. Services which are causing problems are basically the Storage Elements, the system is strongly dependent on the inputs for the event simulation jobs and failing to retrieve it produces job failures, while failures in storing the outputs due to SE's instabilities leads to the loss of the CPU consumed by the job and the consequent failure. From the event simulation is expected that glite-WMS handles the jobs in a more reliable way, and concerning the CE's perhaps introduce different implementations that would have no scalability limitations. Certainly we hope new implementation of the SRM (Storage Resource Manager) interface that would solve stability problems mainly in the stage-in and stageout of the files needed and produced by the jobs respectively.
id cern-1120790
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2007
record_format invenio
spelling cern-11207902019-09-30T06:29:59Zhttp://cds.cern.ch/record/1120790engEspinal, XCampana, SPerini, LRod, WAtlas event production on the EGEE infrastructureDetectors and Experimental TechniquesComputing and ComputersATLAS is one of the four LHC (Large Hadron Collider) experiments at CERN, is devoted to study proton-proton and ion-ion collisions at 14TeV. ATLAS collaboration is composed of about 2000 scientists spread around the world. The activity of the experiment requirements for next year is of about 300TB of storage and a CPU power of about 13 Mski2sk, and is relying on GRID philosophy and EGEE infrastructure. Simulated events are distributed over EGEE by the Atlas production system. Data has to be processed and must be accessible by a huge number of scientists for analysis. The throughput of data for Atlas experiment is expected to be of 320 MB/s with an integrated amount of data per year of ~10Pb. The processing and storage need a distributed share of resources, spread worldwide and interconnected with GRID technologies as the requirements are so demanding for the LHC. In that sense event production is the way to produce, process and store data for analysis before the experiment startup, and is performed in a distributed way. Tasks are defined by physics coordinators and then are assigned to Computing Elements spread worldwide. Some of the jobs that build up the tasks need input data as well to produce new output, this means the jobs may need input from external sites and store remotely. For that reason sites are connected by File Transfer Service (FTS) channels that links the Storage Elements (SE) interface for each site. ATLAS is using the services provided by the EGEE middleware. Event simulation jobs are sent to the LCG (LHC Computing Grid) GRID by glite-WMS (Workload Management System) and Condor-G and using the dispatching tools of the CE's. Event simulation jobs perform the Data Management as well, request the inputs and stores the outputs on the desired SE's, file location and information is managed with distributed LCG File Catalogues (LFC). On the other hand, asymmetric file movement is performed by the ATLAS specific software on Distributed Data Management (DDM), which takes care of the file movement on top of the FTS services. Services which are causing problems are basically the Storage Elements, the system is strongly dependent on the inputs for the event simulation jobs and failing to retrieve it produces job failures, while failures in storing the outputs due to SE's instabilities leads to the loss of the CPU consumed by the job and the consequent failure. From the event simulation is expected that glite-WMS handles the jobs in a more reliable way, and concerning the CE's perhaps introduce different implementations that would have no scalability limitations. Certainly we hope new implementation of the SRM (Storage Resource Manager) interface that would solve stability problems mainly in the stage-in and stageout of the files needed and produced by the jobs respectively.oai:cds.cern.ch:11207902007
spellingShingle Detectors and Experimental Techniques
Computing and Computers
Espinal, X
Campana, S
Perini, L
Rod, W
Atlas event production on the EGEE infrastructure
title Atlas event production on the EGEE infrastructure
title_full Atlas event production on the EGEE infrastructure
title_fullStr Atlas event production on the EGEE infrastructure
title_full_unstemmed Atlas event production on the EGEE infrastructure
title_short Atlas event production on the EGEE infrastructure
title_sort atlas event production on the egee infrastructure
topic Detectors and Experimental Techniques
Computing and Computers
url http://cds.cern.ch/record/1120790
work_keys_str_mv AT espinalx atlaseventproductionontheegeeinfrastructure
AT campanas atlaseventproductionontheegeeinfrastructure
AT perinil atlaseventproductionontheegeeinfrastructure
AT rodw atlaseventproductionontheegeeinfrastructure