Cargando…
Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
Machine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resour...
Autores principales: | , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2857629 |
_version_ | 1780977572583047168 |
---|---|
author | Guan, Wen Maeno, Tadashi Weber, Christian Zhang, Rui Wenaus, Torre Karavakis, Edward De, Kaushik Klimentov, Alexei Lin, Fa-Hui Barreiro Megino, Fernando Harald Nilsson, Paul Yang, Zhaoyu Zhao, Xin |
author_facet | Guan, Wen Maeno, Tadashi Weber, Christian Zhang, Rui Wenaus, Torre Karavakis, Edward De, Kaushik Klimentov, Alexei Lin, Fa-Hui Barreiro Megino, Fernando Harald Nilsson, Paul Yang, Zhaoyu Zhao, Xin |
author_sort | Guan, Wen |
collection | CERN |
description | Machine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these machine learning tasks. In addition, complex advanced machine learning workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex machine learning tasks has become a popular area. In this presentation, we will present our efforts on distributed machine learning in PanDA and iDDS (intelligent Data Delivery Service). We will at first address the difficulties to run machine learning tasks on distributed WLCG resources. Then we will present our implementation with DAG (Directed Acyclic Graph) and sliced parameters in iDDS to distribute machine learning tasks to distributed computing resources to execute them in parallel through PanDA. Next we will demonstrate some use cases we have implemented, such as Hyperparameter Optimization, Monte Carlo Toy confidence limits calculation and Active Learning. Finally we will describe some directions to perform in the future. |
id | cern-2857629 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2023 |
record_format | invenio |
spelling | cern-28576292023-05-04T18:19:47Zhttp://cds.cern.ch/record/2857629engGuan, WenMaeno, TadashiWeber, ChristianZhang, RuiWenaus, TorreKaravakis, EdwardDe, KaushikKlimentov, AlexeiLin, Fa-HuiBarreiro Megino, Fernando HaraldNilsson, PaulYang, ZhaoyuZhao, XinDistributed Machine Learning with PanDA and iDDS in LHC ATLASParticle Physics - ExperimentMachine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these machine learning tasks. In addition, complex advanced machine learning workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex machine learning tasks has become a popular area. In this presentation, we will present our efforts on distributed machine learning in PanDA and iDDS (intelligent Data Delivery Service). We will at first address the difficulties to run machine learning tasks on distributed WLCG resources. Then we will present our implementation with DAG (Directed Acyclic Graph) and sliced parameters in iDDS to distribute machine learning tasks to distributed computing resources to execute them in parallel through PanDA. Next we will demonstrate some use cases we have implemented, such as Hyperparameter Optimization, Monte Carlo Toy confidence limits calculation and Active Learning. Finally we will describe some directions to perform in the future.ATL-SOFT-SLIDE-2023-128oai:cds.cern.ch:28576292023-05-03 |
spellingShingle | Particle Physics - Experiment Guan, Wen Maeno, Tadashi Weber, Christian Zhang, Rui Wenaus, Torre Karavakis, Edward De, Kaushik Klimentov, Alexei Lin, Fa-Hui Barreiro Megino, Fernando Harald Nilsson, Paul Yang, Zhaoyu Zhao, Xin Distributed Machine Learning with PanDA and iDDS in LHC ATLAS |
title | Distributed Machine Learning with PanDA and iDDS in LHC ATLAS |
title_full | Distributed Machine Learning with PanDA and iDDS in LHC ATLAS |
title_fullStr | Distributed Machine Learning with PanDA and iDDS in LHC ATLAS |
title_full_unstemmed | Distributed Machine Learning with PanDA and iDDS in LHC ATLAS |
title_short | Distributed Machine Learning with PanDA and iDDS in LHC ATLAS |
title_sort | distributed machine learning with panda and idds in lhc atlas |
topic | Particle Physics - Experiment |
url | http://cds.cern.ch/record/2857629 |
work_keys_str_mv | AT guanwen distributedmachinelearningwithpandaandiddsinlhcatlas AT maenotadashi distributedmachinelearningwithpandaandiddsinlhcatlas AT weberchristian distributedmachinelearningwithpandaandiddsinlhcatlas AT zhangrui distributedmachinelearningwithpandaandiddsinlhcatlas AT wenaustorre distributedmachinelearningwithpandaandiddsinlhcatlas AT karavakisedward distributedmachinelearningwithpandaandiddsinlhcatlas AT dekaushik distributedmachinelearningwithpandaandiddsinlhcatlas AT klimentovalexei distributedmachinelearningwithpandaandiddsinlhcatlas AT linfahui distributedmachinelearningwithpandaandiddsinlhcatlas AT barreiromeginofernandoharald distributedmachinelearningwithpandaandiddsinlhcatlas AT nilssonpaul distributedmachinelearningwithpandaandiddsinlhcatlas AT yangzhaoyu distributedmachinelearningwithpandaandiddsinlhcatlas AT zhaoxin distributedmachinelearningwithpandaandiddsinlhcatlas |