Cargando…

Distributed Machine Learning with PanDA and iDDS in LHC ATLAS

Machine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resour...

Descripción completa

Detalles Bibliográficos
Autores principales: Guan, Wen, Maeno, Tadashi, Weber, Christian, Zhang, Rui, Wenaus, Torre, Karavakis, Edward, De, Kaushik, Klimentov, Alexei, Lin, Fa-Hui, Barreiro Megino, Fernando Harald, Nilsson, Paul, Yang, Zhaoyu, Zhao, Xin
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2857629
_version_ 1780977572583047168
author Guan, Wen
Maeno, Tadashi
Weber, Christian
Zhang, Rui
Wenaus, Torre
Karavakis, Edward
De, Kaushik
Klimentov, Alexei
Lin, Fa-Hui
Barreiro Megino, Fernando Harald
Nilsson, Paul
Yang, Zhaoyu
Zhao, Xin
author_facet Guan, Wen
Maeno, Tadashi
Weber, Christian
Zhang, Rui
Wenaus, Torre
Karavakis, Edward
De, Kaushik
Klimentov, Alexei
Lin, Fa-Hui
Barreiro Megino, Fernando Harald
Nilsson, Paul
Yang, Zhaoyu
Zhao, Xin
author_sort Guan, Wen
collection CERN
description Machine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these machine learning tasks. In addition, complex advanced machine learning workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex machine learning tasks has become a popular area. In this presentation, we will present our efforts on distributed machine learning in PanDA and iDDS (intelligent Data Delivery Service). We will at first address the difficulties to run machine learning tasks on distributed WLCG resources. Then we will present our implementation with DAG (Directed Acyclic Graph) and sliced parameters in iDDS to distribute machine learning tasks to distributed computing resources to execute them in parallel through PanDA. Next we will demonstrate some use cases we have implemented, such as Hyperparameter Optimization, Monte Carlo Toy confidence limits calculation and Active Learning. Finally we will describe some directions to perform in the future.
id cern-2857629
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28576292023-05-04T18:19:47Zhttp://cds.cern.ch/record/2857629engGuan, WenMaeno, TadashiWeber, ChristianZhang, RuiWenaus, TorreKaravakis, EdwardDe, KaushikKlimentov, AlexeiLin, Fa-HuiBarreiro Megino, Fernando HaraldNilsson, PaulYang, ZhaoyuZhao, XinDistributed Machine Learning with PanDA and iDDS in LHC ATLASParticle Physics - ExperimentMachine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these machine learning tasks. In addition, complex advanced machine learning workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex machine learning tasks has become a popular area. In this presentation, we will present our efforts on distributed machine learning in PanDA and iDDS (intelligent Data Delivery Service). We will at first address the difficulties to run machine learning tasks on distributed WLCG resources. Then we will present our implementation with DAG (Directed Acyclic Graph) and sliced parameters in iDDS to distribute machine learning tasks to distributed computing resources to execute them in parallel through PanDA. Next we will demonstrate some use cases we have implemented, such as Hyperparameter Optimization, Monte Carlo Toy confidence limits calculation and Active Learning. Finally we will describe some directions to perform in the future.ATL-SOFT-SLIDE-2023-128oai:cds.cern.ch:28576292023-05-03
spellingShingle Particle Physics - Experiment
Guan, Wen
Maeno, Tadashi
Weber, Christian
Zhang, Rui
Wenaus, Torre
Karavakis, Edward
De, Kaushik
Klimentov, Alexei
Lin, Fa-Hui
Barreiro Megino, Fernando Harald
Nilsson, Paul
Yang, Zhaoyu
Zhao, Xin
Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
title Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
title_full Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
title_fullStr Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
title_full_unstemmed Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
title_short Distributed Machine Learning with PanDA and iDDS in LHC ATLAS
title_sort distributed machine learning with panda and idds in lhc atlas
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2857629
work_keys_str_mv AT guanwen distributedmachinelearningwithpandaandiddsinlhcatlas
AT maenotadashi distributedmachinelearningwithpandaandiddsinlhcatlas
AT weberchristian distributedmachinelearningwithpandaandiddsinlhcatlas
AT zhangrui distributedmachinelearningwithpandaandiddsinlhcatlas
AT wenaustorre distributedmachinelearningwithpandaandiddsinlhcatlas
AT karavakisedward distributedmachinelearningwithpandaandiddsinlhcatlas
AT dekaushik distributedmachinelearningwithpandaandiddsinlhcatlas
AT klimentovalexei distributedmachinelearningwithpandaandiddsinlhcatlas
AT linfahui distributedmachinelearningwithpandaandiddsinlhcatlas
AT barreiromeginofernandoharald distributedmachinelearningwithpandaandiddsinlhcatlas
AT nilssonpaul distributedmachinelearningwithpandaandiddsinlhcatlas
AT yangzhaoyu distributedmachinelearningwithpandaandiddsinlhcatlas
AT zhaoxin distributedmachinelearningwithpandaandiddsinlhcatlas