Cargando…

PanDA for ATLAS Distributed Computing in the Next Decade

The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experimen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barreiro Megino, Fernando Harald, Klimentov, Alexei, De, Kaushik, Maeno, Tadashi, Nilsson, Paul, Oleynik, Danila, Padolski, Siarhei, Panitkin, Sergey, Wenaus, Torre
Lenguaje:	eng
Publicado:	2016
Materias:	Particle Physics - Experiment
Acceso en línea:	http://cds.cern.ch/record/2218080

_version_	1780952138912890880
author	Barreiro Megino, Fernando Harald Klimentov, Alexei De, Kaushik Maeno, Tadashi Nilsson, Paul Oleynik, Danila Padolski, Siarhei Panitkin, Sergey Wenaus, Torre
author_facet	Barreiro Megino, Fernando Harald Klimentov, Alexei De, Kaushik Maeno, Tadashi Nilsson, Paul Oleynik, Danila Padolski, Siarhei Panitkin, Sergey Wenaus, Torre
author_sort	Barreiro Megino, Fernando Harald
collection	CERN
description	The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC’s first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. Leveraging direct remote data access and federated storage relaxes the geographical coupling between processing and data. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this talk we will focus on the new and planned features that are most important to the next decade of distributed computing workload management.
id	cern-2218080
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2016
record_format	invenio
spelling	cern-22180802019-09-30T06:29:59Zhttp://cds.cern.ch/record/2218080engBarreiro Megino, Fernando HaraldKlimentov, AlexeiDe, KaushikMaeno, TadashiNilsson, PaulOleynik, DanilaPadolski, SiarheiPanitkin, SergeyWenaus, TorrePanDA for ATLAS Distributed Computing in the Next DecadeParticle Physics - ExperimentThe Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC’s first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. Leveraging direct remote data access and federated storage relaxes the geographical coupling between processing and data. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this talk we will focus on the new and planned features that are most important to the next decade of distributed computing workload management.ATL-SOFT-SLIDE-2016-699oai:cds.cern.ch:22180802016-09-25
spellingShingle	Particle Physics - Experiment Barreiro Megino, Fernando Harald Klimentov, Alexei De, Kaushik Maeno, Tadashi Nilsson, Paul Oleynik, Danila Padolski, Siarhei Panitkin, Sergey Wenaus, Torre PanDA for ATLAS Distributed Computing in the Next Decade
title	PanDA for ATLAS Distributed Computing in the Next Decade
title_full	PanDA for ATLAS Distributed Computing in the Next Decade
title_fullStr	PanDA for ATLAS Distributed Computing in the Next Decade
title_full_unstemmed	PanDA for ATLAS Distributed Computing in the Next Decade
title_short	PanDA for ATLAS Distributed Computing in the Next Decade
title_sort	panda for atlas distributed computing in the next decade
topic	Particle Physics - Experiment
url	http://cds.cern.ch/record/2218080
work_keys_str_mv	AT barreiromeginofernandoharald pandaforatlasdistributedcomputinginthenextdecade AT klimentovalexei pandaforatlasdistributedcomputinginthenextdecade AT dekaushik pandaforatlasdistributedcomputinginthenextdecade AT maenotadashi pandaforatlasdistributedcomputinginthenextdecade AT nilssonpaul pandaforatlasdistributedcomputinginthenextdecade AT oleynikdanila pandaforatlasdistributedcomputinginthenextdecade AT padolskisiarhei pandaforatlasdistributedcomputinginthenextdecade AT panitkinsergey pandaforatlasdistributedcomputinginthenextdecade AT wenaustorre pandaforatlasdistributedcomputinginthenextdecade

PanDA for ATLAS Distributed Computing in the Next Decade

Ejemplares similares