Cargando…

Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources

The ATLAS experiment has successfully integrated HighPerformance Computing resources (HPCs) in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems...

Descripción completa

Detalles Bibliográficos
Autores principales: Muškinja, Miha, Calafiura, Paolo, Leggett, Charles, Shapoval, Illya, Tsulaia, Vakho
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:https://dx.doi.org/10.1051/epjconf/202024505042
http://cds.cern.ch/record/2752848
_version_ 1780969331863060480
author Muškinja, Miha
Calafiura, Paolo
Leggett, Charles
Shapoval, Illya
Tsulaia, Vakho
author_facet Muškinja, Miha
Calafiura, Paolo
Leggett, Charles
Shapoval, Illya
Tsulaia, Vakho
author_sort Muškinja, Miha
collection CERN
description The ATLAS experiment has successfully integrated HighPerformance Computing resources (HPCs) in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems will have radically different architectures, and most of them will provide partitions optimized for different kinds of workloads. In this work we explore the applicability of concepts and tools realized in Ray (the high-performance distributed execution framework targeting large-scale machine learning applications) to ATLAS event throughput optimization on heterogeneous distributed resources, ranging from traditional grid clusters to Exascale computers. We present a prototype of Raythena, a Ray-based implementation of the ATLAS Event Service (AES), a fine-grained event processing workflow aimed at improving the efficiency of ATLAS workflows on opportunistic resources, specifically HPCs. The AES is implemented as an event processing task farm that distributes packets of events to several worker processes running on multiple nodes. Each worker in the task farm runs an event-processing application (Athena) as a daemon. The whole system is orchestrated by Ray, which assigns work in a distributed, possibly heterogeneous, environment.For all its flexibility, the AES implementation is currently comprised of multiple separate layers that communicate through ad-hoc command-line and filebased interfaces. The goal of Raythena is to integrate these layers through a feature-rich, efficient application framework. Besides increasing usability and robustness, a vertically integrated scheduler will enable us to explore advanced concepts such as dynamically shaping of workflows to exploit currently available resources, particularly on heterogeneous systems.
id oai-inspirehep.net-1832179
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
record_format invenio
spelling oai-inspirehep.net-18321792022-11-17T14:32:50Zdoi:10.1051/epjconf/202024505042http://cds.cern.ch/record/2752848engMuškinja, MihaCalafiura, PaoloLeggett, CharlesShapoval, IllyaTsulaia, VakhoRaythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resourcesComputing and ComputersThe ATLAS experiment has successfully integrated HighPerformance Computing resources (HPCs) in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems will have radically different architectures, and most of them will provide partitions optimized for different kinds of workloads. In this work we explore the applicability of concepts and tools realized in Ray (the high-performance distributed execution framework targeting large-scale machine learning applications) to ATLAS event throughput optimization on heterogeneous distributed resources, ranging from traditional grid clusters to Exascale computers. We present a prototype of Raythena, a Ray-based implementation of the ATLAS Event Service (AES), a fine-grained event processing workflow aimed at improving the efficiency of ATLAS workflows on opportunistic resources, specifically HPCs. The AES is implemented as an event processing task farm that distributes packets of events to several worker processes running on multiple nodes. Each worker in the task farm runs an event-processing application (Athena) as a daemon. The whole system is orchestrated by Ray, which assigns work in a distributed, possibly heterogeneous, environment.For all its flexibility, the AES implementation is currently comprised of multiple separate layers that communicate through ad-hoc command-line and filebased interfaces. The goal of Raythena is to integrate these layers through a feature-rich, efficient application framework. Besides increasing usability and robustness, a vertically integrated scheduler will enable us to explore advanced concepts such as dynamically shaping of workflows to exploit currently available resources, particularly on heterogeneous systems.oai:inspirehep.net:18321792020
spellingShingle Computing and Computers
Muškinja, Miha
Calafiura, Paolo
Leggett, Charles
Shapoval, Illya
Tsulaia, Vakho
Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources
title Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources
title_full Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources
title_fullStr Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources
title_full_unstemmed Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources
title_short Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources
title_sort raythena: a vertically integrated scheduler for atlas applications on heterogeneous distributed resources
topic Computing and Computers
url https://dx.doi.org/10.1051/epjconf/202024505042
http://cds.cern.ch/record/2752848
work_keys_str_mv AT muskinjamiha raythenaaverticallyintegratedschedulerforatlasapplicationsonheterogeneousdistributedresources
AT calafiurapaolo raythenaaverticallyintegratedschedulerforatlasapplicationsonheterogeneousdistributedresources
AT leggettcharles raythenaaverticallyintegratedschedulerforatlasapplicationsonheterogeneousdistributedresources
AT shapovalillya raythenaaverticallyintegratedschedulerforatlasapplicationsonheterogeneousdistributedresources
AT tsulaiavakho raythenaaverticallyintegratedschedulerforatlasapplicationsonheterogeneousdistributedresources