Cargando…

Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS

Machine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing r...

Descripción completa

Detalles Bibliográficos
Autores principales: Guan, Wen, Maeno, Tadashi, Zhang, Rui, Weber, Christian, Wenaus, Torre, Alekseev, Aleksandr, Barreiro Megino, Fernando Harald, De, Kaushik, Karavakis, Edward, Klimentov, Alexei, Korchuganova, Tatiana, Lin, Fa-Hui, Nilsson, Paul, Yang, Zhaoyu, Zhao, Xin
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2868050
_version_ 1780978194708430848
author Guan, Wen
Maeno, Tadashi
Zhang, Rui
Weber, Christian
Wenaus, Torre
Alekseev, Aleksandr
Barreiro Megino, Fernando Harald
De, Kaushik
Karavakis, Edward
Klimentov, Alexei
Korchuganova, Tatiana
Lin, Fa-Hui
Nilsson, Paul
Yang, Zhaoyu
Zhao, Xin
author_facet Guan, Wen
Maeno, Tadashi
Zhang, Rui
Weber, Christian
Wenaus, Torre
Alekseev, Aleksandr
Barreiro Megino, Fernando Harald
De, Kaushik
Karavakis, Edward
Klimentov, Alexei
Korchuganova, Tatiana
Lin, Fa-Hui
Nilsson, Paul
Yang, Zhaoyu
Zhao, Xin
author_sort Guan, Wen
collection CERN
description Machine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these ML tasks. In addition, complex advanced ML workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex ML tasks has become a popular area. In this paper, we will present our efforts enabling the execution of distributed ML workflows on the Production and Distributed Analysis (PanDA) system and intelligent Data Delivery Service (iDDS). First, we will describe how PanDA and iDDS deal with large-scale ML workflows, including the implementation to process workloads on diverse and geographically distributed computing resources. Next, we will report real-world use cases, such as HyperParameter Optimization, Monte Carlo Toy confidence limits calculation, and Active Learning. Finally, we will conclude with future plans.
id cern-2868050
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28680502023-08-21T20:59:25Zhttp://cds.cern.ch/record/2868050engGuan, WenMaeno, TadashiZhang, RuiWeber, ChristianWenaus, TorreAlekseev, AleksandrBarreiro Megino, Fernando HaraldDe, KaushikKaravakis, EdwardKlimentov, AlexeiKorchuganova, TatianaLin, Fa-HuiNilsson, PaulYang, ZhaoyuZhao, XinDistributed Machine Learning Workflow with PanDA and iDDS in LHC ATLASParticle Physics - ExperimentMachine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these ML tasks. In addition, complex advanced ML workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex ML tasks has become a popular area. In this paper, we will present our efforts enabling the execution of distributed ML workflows on the Production and Distributed Analysis (PanDA) system and intelligent Data Delivery Service (iDDS). First, we will describe how PanDA and iDDS deal with large-scale ML workflows, including the implementation to process workloads on diverse and geographically distributed computing resources. Next, we will report real-world use cases, such as HyperParameter Optimization, Monte Carlo Toy confidence limits calculation, and Active Learning. Finally, we will conclude with future plans.ATL-SOFT-PROC-2023-010oai:cds.cern.ch:28680502023-08-21
spellingShingle Particle Physics - Experiment
Guan, Wen
Maeno, Tadashi
Zhang, Rui
Weber, Christian
Wenaus, Torre
Alekseev, Aleksandr
Barreiro Megino, Fernando Harald
De, Kaushik
Karavakis, Edward
Klimentov, Alexei
Korchuganova, Tatiana
Lin, Fa-Hui
Nilsson, Paul
Yang, Zhaoyu
Zhao, Xin
Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
title Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
title_full Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
title_fullStr Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
title_full_unstemmed Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
title_short Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
title_sort distributed machine learning workflow with panda and idds in lhc atlas
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2868050
work_keys_str_mv AT guanwen distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT maenotadashi distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT zhangrui distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT weberchristian distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT wenaustorre distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT alekseevaleksandr distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT barreiromeginofernandoharald distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT dekaushik distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT karavakisedward distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT klimentovalexei distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT korchuganovatatiana distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT linfahui distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT nilssonpaul distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT yangzhaoyu distributedmachinelearningworkflowwithpandaandiddsinlhcatlas
AT zhaoxin distributedmachinelearningworkflowwithpandaandiddsinlhcatlas