Cargando…

gLExec Integration with the ATLAS PanDA Workload Management System

The ATLAS Experiment at the Large Hadron Collider has collected data during Run 1 and is ready to collect data in Run 2. The ATLAS data are distributed, processed and analysed at more than 130 grid and cloud sites across the world. At any given time, there are more than 150,000 concurrent jobs runni...

Descripción completa

Detalles Bibliográficos
Autores principales: Edward Karavakis, Barreiro Megino, Fernando Harald, Campana, Simone, De, Kaushik, Di Girolamo, Alessandro, Maarten Litmaath, Maeno, Tadashi, Medrano Llamas, Ramon, Nilsson, Paul, Wenaus, Torre
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:http://cds.cern.ch/record/2001856
_version_ 1780946034268045312
author Edward Karavakis
Barreiro Megino, Fernando Harald
Campana, Simone
De, Kaushik
Di Girolamo, Alessandro
Maarten Litmaath
Maeno, Tadashi
Medrano Llamas, Ramon
Nilsson, Paul
Wenaus, Torre
author_facet Edward Karavakis
Barreiro Megino, Fernando Harald
Campana, Simone
De, Kaushik
Di Girolamo, Alessandro
Maarten Litmaath
Maeno, Tadashi
Medrano Llamas, Ramon
Nilsson, Paul
Wenaus, Torre
author_sort Edward Karavakis
collection CERN
description The ATLAS Experiment at the Large Hadron Collider has collected data during Run 1 and is ready to collect data in Run 2. The ATLAS data are distributed, processed and analysed at more than 130 grid and cloud sites across the world. At any given time, there are more than 150,000 concurrent jobs running and about a million jobs are submitted on a daily basis on behalf of thousands of physicists within the ATLAS collaboration. The Production and Distributed Analysis (PanDA) workload management system has proved to be a key component of ATLAS and plays a crucial role in the success of the large-scale distributed computing as it is the sole system for distributed processing of Grid jobs across the collaboration since October 2007. ATLAS user jobs are executed on worker nodes by pilots sent to the sites by pilot factories. This pilot architecture has greatly improved job reliability and although it has clear advantages, such as making the working environment homogeneous by hiding any potential heterogeneities, the approach presents security and traceability issues distinct from standard batch jobs for which the submitter is also the payload owner. Jobs initially inherit the identity of the pilot submitter, typically a robot certificate with very limited rights. By default the payload jobs then execute directly under that same identity on a Worker Node. This exposes the pilot environment to the payload, requiring any pilot 'secrets' such as the proxy to be hidden; it constrains the rights and identity of the user job to be identical to the pilot; and it requires sites to take extra measures to achieve user traceability and user job isolation. To address these security risks, the gLExec tool and framework can be used to let the payloads for each user be executed under a different UNIX user identity that uniquely identifies the ATLAS user. This presentation describes the recent improvements and evolution of the security model within the ATLAS PanDA system, including improvements in the PanDA pilot, in the PanDA server and their integration with MyProxy, a credential caching system that entitles a person or a service to act in the name of the issuer of the credential. Finally, we will present results from ATLAS user jobs running with gLExec and give an insight into future deployment plans.
id cern-2001856
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling cern-20018562019-09-30T06:29:59Zhttp://cds.cern.ch/record/2001856engEdward KaravakisBarreiro Megino, Fernando HaraldCampana, SimoneDe, KaushikDi Girolamo, AlessandroMaarten LitmaathMaeno, TadashiMedrano Llamas, RamonNilsson, PaulWenaus, TorregLExec Integration with the ATLAS PanDA Workload Management SystemParticle Physics - ExperimentThe ATLAS Experiment at the Large Hadron Collider has collected data during Run 1 and is ready to collect data in Run 2. The ATLAS data are distributed, processed and analysed at more than 130 grid and cloud sites across the world. At any given time, there are more than 150,000 concurrent jobs running and about a million jobs are submitted on a daily basis on behalf of thousands of physicists within the ATLAS collaboration. The Production and Distributed Analysis (PanDA) workload management system has proved to be a key component of ATLAS and plays a crucial role in the success of the large-scale distributed computing as it is the sole system for distributed processing of Grid jobs across the collaboration since October 2007. ATLAS user jobs are executed on worker nodes by pilots sent to the sites by pilot factories. This pilot architecture has greatly improved job reliability and although it has clear advantages, such as making the working environment homogeneous by hiding any potential heterogeneities, the approach presents security and traceability issues distinct from standard batch jobs for which the submitter is also the payload owner. Jobs initially inherit the identity of the pilot submitter, typically a robot certificate with very limited rights. By default the payload jobs then execute directly under that same identity on a Worker Node. This exposes the pilot environment to the payload, requiring any pilot 'secrets' such as the proxy to be hidden; it constrains the rights and identity of the user job to be identical to the pilot; and it requires sites to take extra measures to achieve user traceability and user job isolation. To address these security risks, the gLExec tool and framework can be used to let the payloads for each user be executed under a different UNIX user identity that uniquely identifies the ATLAS user. This presentation describes the recent improvements and evolution of the security model within the ATLAS PanDA system, including improvements in the PanDA pilot, in the PanDA server and their integration with MyProxy, a credential caching system that entitles a person or a service to act in the name of the issuer of the credential. Finally, we will present results from ATLAS user jobs running with gLExec and give an insight into future deployment plans.ATL-SOFT-SLIDE-2015-070oai:cds.cern.ch:20018562015-03-16
spellingShingle Particle Physics - Experiment
Edward Karavakis
Barreiro Megino, Fernando Harald
Campana, Simone
De, Kaushik
Di Girolamo, Alessandro
Maarten Litmaath
Maeno, Tadashi
Medrano Llamas, Ramon
Nilsson, Paul
Wenaus, Torre
gLExec Integration with the ATLAS PanDA Workload Management System
title gLExec Integration with the ATLAS PanDA Workload Management System
title_full gLExec Integration with the ATLAS PanDA Workload Management System
title_fullStr gLExec Integration with the ATLAS PanDA Workload Management System
title_full_unstemmed gLExec Integration with the ATLAS PanDA Workload Management System
title_short gLExec Integration with the ATLAS PanDA Workload Management System
title_sort glexec integration with the atlas panda workload management system
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2001856
work_keys_str_mv AT edwardkaravakis glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT barreiromeginofernandoharald glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT campanasimone glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT dekaushik glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT digirolamoalessandro glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT maartenlitmaath glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT maenotadashi glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT medranollamasramon glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT nilssonpaul glexecintegrationwiththeatlaspandaworkloadmanagementsystem
AT wenaustorre glexecintegrationwiththeatlaspandaworkloadmanagementsystem