Cargando…

ATLAS Distributed Computing experience and performance during the LHC Run-2

ATLAS Distributed Computing during LHC Run-1 was challenged by steadily increasing computing, storage and network requirements. In addition, the complexity of processing task workflows and their associated data management requirements led to a new paradigm in the ATLAS computing model for Run-2, acc...

Descripción completa

Detalles Bibliográficos
Autor principal:	Filipcic, Andrej
Lenguaje:	eng
Publicado:	2016
Materias:	Particle Physics - Experiment
Acceso en línea:	http://cds.cern.ch/record/2218083

_version_	1780952139340709888
author	Filipcic, Andrej
author_facet	Filipcic, Andrej
author_sort	Filipcic, Andrej
collection	CERN
description	ATLAS Distributed Computing during LHC Run-1 was challenged by steadily increasing computing, storage and network requirements. In addition, the complexity of processing task workflows and their associated data management requirements led to a new paradigm in the ATLAS computing model for Run-2, accompanied by extensive evolution and redesign of the workflow and data management systems. The new systems were put into production at the end of 2014, and gained robustness and maturity during 2015 data taking. ProdSys2, the new request and task interface; JEDI, the dynamic job execution engine developed as an extension to PanDA; and Rucio, the new data management system, form the core of the Run-2 ATLAS distributed computing engine. One of the big changes for Run-2 was the adoption of the Derivation Framework, which moves the chaotic CPU and data intensive part of the user analysis into the centrally organized train production, delivering derived AOD datasets to user groups for final analysis. The effectiveness of the new model was demonstrated through the delivery of analysis datasets to users just one week after data taking, by completing the calibration loop, Tier-0 processing and train production steps promptly. The great flexibility of the new system also makes it possible to execute part of the Tier-0 processing on the grid when Tier-0 resources experience a backlog during high data-taking periods. The introduction of the data lifetime model, where each dataset is assigned a finite lifetime (with extensions possible for frequently accessed data), was made possible by Rucio. Thanks to this the storage crises experienced in Run-1 have not reappeared during Run-2. In addition, the distinction between Tier-1 and Tier-2 disk storage, now largely artificial given the quality of Tier-2 resources and their networking, has been removed through the introduction of dynamic ATLAS clouds that group the storage endpoint nucleus and its close-by execution satellite sites. All stable ATLAS sites are now able to store unique or primary copies of the datasets. ATLAS Distributed Computing is further evolving to speed up request processing by introducing network awareness, using machine learning and optimization of the latencies during the execution of the full chain of tasks. The Event Service, a new workflow and job execution engine, is designed around check-pointing at the level of event processing to use opportunistic resources more efficiently.
id	cern-2218083
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2016
record_format	invenio
spelling	cern-22180832019-09-30T06:29:59Zhttp://cds.cern.ch/record/2218083engFilipcic, AndrejATLAS Distributed Computing experience and performance during the LHC Run-2Particle Physics - ExperimentATLAS Distributed Computing during LHC Run-1 was challenged by steadily increasing computing, storage and network requirements. In addition, the complexity of processing task workflows and their associated data management requirements led to a new paradigm in the ATLAS computing model for Run-2, accompanied by extensive evolution and redesign of the workflow and data management systems. The new systems were put into production at the end of 2014, and gained robustness and maturity during 2015 data taking. ProdSys2, the new request and task interface; JEDI, the dynamic job execution engine developed as an extension to PanDA; and Rucio, the new data management system, form the core of the Run-2 ATLAS distributed computing engine. One of the big changes for Run-2 was the adoption of the Derivation Framework, which moves the chaotic CPU and data intensive part of the user analysis into the centrally organized train production, delivering derived AOD datasets to user groups for final analysis. The effectiveness of the new model was demonstrated through the delivery of analysis datasets to users just one week after data taking, by completing the calibration loop, Tier-0 processing and train production steps promptly. The great flexibility of the new system also makes it possible to execute part of the Tier-0 processing on the grid when Tier-0 resources experience a backlog during high data-taking periods. The introduction of the data lifetime model, where each dataset is assigned a finite lifetime (with extensions possible for frequently accessed data), was made possible by Rucio. Thanks to this the storage crises experienced in Run-1 have not reappeared during Run-2. In addition, the distinction between Tier-1 and Tier-2 disk storage, now largely artificial given the quality of Tier-2 resources and their networking, has been removed through the introduction of dynamic ATLAS clouds that group the storage endpoint nucleus and its close-by execution satellite sites. All stable ATLAS sites are now able to store unique or primary copies of the datasets. ATLAS Distributed Computing is further evolving to speed up request processing by introducing network awareness, using machine learning and optimization of the latencies during the execution of the full chain of tasks. The Event Service, a new workflow and job execution engine, is designed around check-pointing at the level of event processing to use opportunistic resources more efficiently.ATL-SOFT-SLIDE-2016-701oai:cds.cern.ch:22180832016-09-25
spellingShingle	Particle Physics - Experiment Filipcic, Andrej ATLAS Distributed Computing experience and performance during the LHC Run-2
title	ATLAS Distributed Computing experience and performance during the LHC Run-2
title_full	ATLAS Distributed Computing experience and performance during the LHC Run-2
title_fullStr	ATLAS Distributed Computing experience and performance during the LHC Run-2
title_full_unstemmed	ATLAS Distributed Computing experience and performance during the LHC Run-2
title_short	ATLAS Distributed Computing experience and performance during the LHC Run-2
title_sort	atlas distributed computing experience and performance during the lhc run-2
topic	Particle Physics - Experiment
url	http://cds.cern.ch/record/2218083
work_keys_str_mv	AT filipcicandrej atlasdistributedcomputingexperienceandperformanceduringthelhcrun2

ATLAS Distributed Computing experience and performance during the LHC Run-2

Ejemplares similares