Cargando…

The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity

The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job...

Descripción completa

Detalles Bibliográficos
Autores principales: Borodin, Mikhail, Barreiro Megino, Fernando Harald, De, Kaushik, Golubkov, Dmitry, Klimentov, Alexei, Maeno, Tadashi, Mashinistov, Ruslan, Padolski, Siarhei, Wenaus, Torre
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/5/052016
http://cds.cern.ch/record/2244614
_version_ 1780953427043418112
author Borodin, Mikhail
Barreiro Megino, Fernando Harald
De, Kaushik
Golubkov, Dmitry
Klimentov, Alexei
Maeno, Tadashi
Mashinistov, Ruslan
Padolski, Siarhei
Wenaus, Torre
author_facet Borodin, Mikhail
Barreiro Megino, Fernando Harald
De, Kaushik
Golubkov, Dmitry
Klimentov, Alexei
Maeno, Tadashi
Mashinistov, Ruslan
Padolski, Siarhei
Wenaus, Torre
author_sort Borodin, Mikhail
collection CERN
description The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteering computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn’t exist in the earliest versions of the production system where Grid resources topology has been predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run a multi-Terabyte tasks without human intervention. We have implemented train model and open-ended production which allows to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production run by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first years of LHC Run2. We also report the performance of the designed system and how various workflows such as data (re)processing, Monte-Carlo and physics group production, users analysis are scheduled and executed within one production system on heterogeneous computing resources.
id cern-2244614
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling cern-22446142019-10-15T15:16:44Zdoi:10.1088/1742-6596/898/5/052016http://cds.cern.ch/record/2244614engBorodin, MikhailBarreiro Megino, Fernando HaraldDe, KaushikGolubkov, DmitryKlimentov, AlexeiMaeno, TadashiMashinistov, RuslanPadolski, SiarheiWenaus, TorreThe ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-LuminosityParticle Physics - ExperimentThe second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteering computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn’t exist in the earliest versions of the production system where Grid resources topology has been predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run a multi-Terabyte tasks without human intervention. We have implemented train model and open-ended production which allows to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production run by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first years of LHC Run2. We also report the performance of the designed system and how various workflows such as data (re)processing, Monte-Carlo and physics group production, users analysis are scheduled and executed within one production system on heterogeneous computing resources.The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteer-computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn’t exist in the earliest versions of the production system where Grid resources topology was predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run multi-Terabyte tasks without human intervention. We have implemented “train” model and open-ended production which allow to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first year of LHC Run2. We also report the performance of the designed system and how various workflows, such as data (re)processing, Monte-Carlo and physics group production, users analysis, are scheduled and executed within one production system on heterogeneous computing resources.ATL-SOFT-PROC-2017-044oai:cds.cern.ch:22446142017-02-07
spellingShingle Particle Physics - Experiment
Borodin, Mikhail
Barreiro Megino, Fernando Harald
De, Kaushik
Golubkov, Dmitry
Klimentov, Alexei
Maeno, Tadashi
Mashinistov, Ruslan
Padolski, Siarhei
Wenaus, Torre
The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity
title The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity
title_full The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity
title_fullStr The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity
title_full_unstemmed The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity
title_short The ATLAS Production System Evolution: New Data Processing and Analysis Paradigm for the LHC Run2 and High-Luminosity
title_sort atlas production system evolution: new data processing and analysis paradigm for the lhc run2 and high-luminosity
topic Particle Physics - Experiment
url https://dx.doi.org/10.1088/1742-6596/898/5/052016
http://cds.cern.ch/record/2244614
work_keys_str_mv AT borodinmikhail theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT barreiromeginofernandoharald theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT dekaushik theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT golubkovdmitry theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT klimentovalexei theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT maenotadashi theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT mashinistovruslan theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT padolskisiarhei theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT wenaustorre theatlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT borodinmikhail atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT barreiromeginofernandoharald atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT dekaushik atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT golubkovdmitry atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT klimentovalexei atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT maenotadashi atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT mashinistovruslan atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT padolskisiarhei atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity
AT wenaustorre atlasproductionsystemevolutionnewdataprocessingandanalysisparadigmforthelhcrun2andhighluminosity