Cargando…

Atlas event production on the EGEE infrastructure

ATLAS is one of the four LHC (Large Hadron Collider) experiments at CERN, is devoted to study proton-proton and ion-ion collisions at 14TeV. ATLAS collaboration is composed of about 2000 scientists spread around the world. The activity of the experiment requirements for next year is of about 300TB o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Espinal, X, Campana, S, Perini, L, Rod, W
Lenguaje:	eng
Publicado:	2007
Materias:	Detectors and Experimental Techniques Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1120790

_version_	1780914565871042560
author	Espinal, X Campana, S Perini, L Rod, W
author_facet	Espinal, X Campana, S Perini, L Rod, W
author_sort	Espinal, X
collection	CERN
description	ATLAS is one of the four LHC (Large Hadron Collider) experiments at CERN, is devoted to study proton-proton and ion-ion collisions at 14TeV. ATLAS collaboration is composed of about 2000 scientists spread around the world. The activity of the experiment requirements for next year is of about 300TB of storage and a CPU power of about 13 Mski2sk, and is relying on GRID philosophy and EGEE infrastructure. Simulated events are distributed over EGEE by the Atlas production system. Data has to be processed and must be accessible by a huge number of scientists for analysis. The throughput of data for Atlas experiment is expected to be of 320 MB/s with an integrated amount of data per year of ~10Pb. The processing and storage need a distributed share of resources, spread worldwide and interconnected with GRID technologies as the requirements are so demanding for the LHC. In that sense event production is the way to produce, process and store data for analysis before the experiment startup, and is performed in a distributed way. Tasks are defined by physics coordinators and then are assigned to Computing Elements spread worldwide. Some of the jobs that build up the tasks need input data as well to produce new output, this means the jobs may need input from external sites and store remotely. For that reason sites are connected by File Transfer Service (FTS) channels that links the Storage Elements (SE) interface for each site. ATLAS is using the services provided by the EGEE middleware. Event simulation jobs are sent to the LCG (LHC Computing Grid) GRID by glite-WMS (Workload Management System) and Condor-G and using the dispatching tools of the CE's. Event simulation jobs perform the Data Management as well, request the inputs and stores the outputs on the desired SE's, file location and information is managed with distributed LCG File Catalogues (LFC). On the other hand, asymmetric file movement is performed by the ATLAS specific software on Distributed Data Management (DDM), which takes care of the file movement on top of the FTS services. Services which are causing problems are basically the Storage Elements, the system is strongly dependent on the inputs for the event simulation jobs and failing to retrieve it produces job failures, while failures in storing the outputs due to SE's instabilities leads to the loss of the CPU consumed by the job and the consequent failure. From the event simulation is expected that glite-WMS handles the jobs in a more reliable way, and concerning the CE's perhaps introduce different implementations that would have no scalability limitations. Certainly we hope new implementation of the SRM (Storage Resource Manager) interface that would solve stability problems mainly in the stage-in and stageout of the files needed and produced by the jobs respectively.
id	cern-1120790
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2007
record_format	invenio
spelling	cern-11207902019-09-30T06:29:59Zhttp://cds.cern.ch/record/1120790engEspinal, XCampana, SPerini, LRod, WAtlas event production on the EGEE infrastructureDetectors and Experimental TechniquesComputing and ComputersATLAS is one of the four LHC (Large Hadron Collider) experiments at CERN, is devoted to study proton-proton and ion-ion collisions at 14TeV. ATLAS collaboration is composed of about 2000 scientists spread around the world. The activity of the experiment requirements for next year is of about 300TB of storage and a CPU power of about 13 Mski2sk, and is relying on GRID philosophy and EGEE infrastructure. Simulated events are distributed over EGEE by the Atlas production system. Data has to be processed and must be accessible by a huge number of scientists for analysis. The throughput of data for Atlas experiment is expected to be of 320 MB/s with an integrated amount of data per year of ~10Pb. The processing and storage need a distributed share of resources, spread worldwide and interconnected with GRID technologies as the requirements are so demanding for the LHC. In that sense event production is the way to produce, process and store data for analysis before the experiment startup, and is performed in a distributed way. Tasks are defined by physics coordinators and then are assigned to Computing Elements spread worldwide. Some of the jobs that build up the tasks need input data as well to produce new output, this means the jobs may need input from external sites and store remotely. For that reason sites are connected by File Transfer Service (FTS) channels that links the Storage Elements (SE) interface for each site. ATLAS is using the services provided by the EGEE middleware. Event simulation jobs are sent to the LCG (LHC Computing Grid) GRID by glite-WMS (Workload Management System) and Condor-G and using the dispatching tools of the CE's. Event simulation jobs perform the Data Management as well, request the inputs and stores the outputs on the desired SE's, file location and information is managed with distributed LCG File Catalogues (LFC). On the other hand, asymmetric file movement is performed by the ATLAS specific software on Distributed Data Management (DDM), which takes care of the file movement on top of the FTS services. Services which are causing problems are basically the Storage Elements, the system is strongly dependent on the inputs for the event simulation jobs and failing to retrieve it produces job failures, while failures in storing the outputs due to SE's instabilities leads to the loss of the CPU consumed by the job and the consequent failure. From the event simulation is expected that glite-WMS handles the jobs in a more reliable way, and concerning the CE's perhaps introduce different implementations that would have no scalability limitations. Certainly we hope new implementation of the SRM (Storage Resource Manager) interface that would solve stability problems mainly in the stage-in and stageout of the files needed and produced by the jobs respectively.oai:cds.cern.ch:11207902007
spellingShingle	Detectors and Experimental Techniques Computing and Computers Espinal, X Campana, S Perini, L Rod, W Atlas event production on the EGEE infrastructure
title	Atlas event production on the EGEE infrastructure
title_full	Atlas event production on the EGEE infrastructure
title_fullStr	Atlas event production on the EGEE infrastructure
title_full_unstemmed	Atlas event production on the EGEE infrastructure
title_short	Atlas event production on the EGEE infrastructure
title_sort	atlas event production on the egee infrastructure
topic	Detectors and Experimental Techniques Computing and Computers
url	http://cds.cern.ch/record/1120790
work_keys_str_mv	AT espinalx atlaseventproductionontheegeeinfrastructure AT campanas atlaseventproductionontheegeeinfrastructure AT perinil atlaseventproductionontheegeeinfrastructure AT rodw atlaseventproductionontheegeeinfrastructure

Atlas event production on the EGEE infrastructure

Ejemplares similares