Cargando…
Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS
Machine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing r...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2868050 |
_version_ | 1780978194708430848 |
---|---|
author | Guan, Wen Maeno, Tadashi Zhang, Rui Weber, Christian Wenaus, Torre Alekseev, Aleksandr Barreiro Megino, Fernando Harald De, Kaushik Karavakis, Edward Klimentov, Alexei Korchuganova, Tatiana Lin, Fa-Hui Nilsson, Paul Yang, Zhaoyu Zhao, Xin |
author_facet | Guan, Wen Maeno, Tadashi Zhang, Rui Weber, Christian Wenaus, Torre Alekseev, Aleksandr Barreiro Megino, Fernando Harald De, Kaushik Karavakis, Edward Klimentov, Alexei Korchuganova, Tatiana Lin, Fa-Hui Nilsson, Paul Yang, Zhaoyu Zhao, Xin |
author_sort | Guan, Wen |
collection | CERN |
description | Machine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these ML tasks. In addition, complex advanced ML workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex ML tasks has become a popular area. In this paper, we will present our efforts enabling the execution of distributed ML workflows on the Production and Distributed Analysis (PanDA) system and intelligent Data Delivery Service (iDDS). First, we will describe how PanDA and iDDS deal with large-scale ML workflows, including the implementation to process workloads on diverse and geographically distributed computing resources. Next, we will report real-world use cases, such as HyperParameter Optimization, Monte Carlo Toy confidence limits calculation, and Active Learning. Finally, we will conclude with future plans. |
id | cern-2868050 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2023 |
record_format | invenio |
spelling | cern-28680502023-08-21T20:59:25Zhttp://cds.cern.ch/record/2868050engGuan, WenMaeno, TadashiZhang, RuiWeber, ChristianWenaus, TorreAlekseev, AleksandrBarreiro Megino, Fernando HaraldDe, KaushikKaravakis, EdwardKlimentov, AlexeiKorchuganova, TatianaLin, Fa-HuiNilsson, PaulYang, ZhaoyuZhao, XinDistributed Machine Learning Workflow with PanDA and iDDS in LHC ATLASParticle Physics - ExperimentMachine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these ML tasks. In addition, complex advanced ML workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex ML tasks has become a popular area. In this paper, we will present our efforts enabling the execution of distributed ML workflows on the Production and Distributed Analysis (PanDA) system and intelligent Data Delivery Service (iDDS). First, we will describe how PanDA and iDDS deal with large-scale ML workflows, including the implementation to process workloads on diverse and geographically distributed computing resources. Next, we will report real-world use cases, such as HyperParameter Optimization, Monte Carlo Toy confidence limits calculation, and Active Learning. Finally, we will conclude with future plans.ATL-SOFT-PROC-2023-010oai:cds.cern.ch:28680502023-08-21 |
spellingShingle | Particle Physics - Experiment Guan, Wen Maeno, Tadashi Zhang, Rui Weber, Christian Wenaus, Torre Alekseev, Aleksandr Barreiro Megino, Fernando Harald De, Kaushik Karavakis, Edward Klimentov, Alexei Korchuganova, Tatiana Lin, Fa-Hui Nilsson, Paul Yang, Zhaoyu Zhao, Xin Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS |
title | Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS |
title_full | Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS |
title_fullStr | Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS |
title_full_unstemmed | Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS |
title_short | Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS |
title_sort | distributed machine learning workflow with panda and idds in lhc atlas |
topic | Particle Physics - Experiment |
url | http://cds.cern.ch/record/2868050 |
work_keys_str_mv | AT guanwen distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT maenotadashi distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT zhangrui distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT weberchristian distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT wenaustorre distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT alekseevaleksandr distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT barreiromeginofernandoharald distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT dekaushik distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT karavakisedward distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT klimentovalexei distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT korchuganovatatiana distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT linfahui distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT nilssonpaul distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT yangzhaoyu distributedmachinelearningworkflowwithpandaandiddsinlhcatlas AT zhaoxin distributedmachinelearningworkflowwithpandaandiddsinlhcatlas |