Cargando…

Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers

There is an increase in interest of showing the importance of HPC for Artificial In- telligence and Artificial Intelligence for HPC. HPC centers are now boasting larger and larger compute power, with more centers reaching exascale. CERN is invest- igating the usage of Artificial Intelligence and Mac...

Descripción completa

Detalles Bibliográficos
Autor principal: Sørlie, Lars
Lenguaje:eng
Publicado: NTNU 2022
Materias:
Acceso en línea:http://cds.cern.ch/record/2839735
_version_ 1780975978661543936
author Sørlie, Lars
author_facet Sørlie, Lars
author_sort Sørlie, Lars
collection CERN
description There is an increase in interest of showing the importance of HPC for Artificial In- telligence and Artificial Intelligence for HPC. HPC centers are now boasting larger and larger compute power, with more centers reaching exascale. CERN is invest- igating the usage of Artificial Intelligence and Machine Learning to augment or replace some of the traditional workflows within the LHC experiments. Advant- ages of Machine Learning and Artificial Intelligence is their highly parallelizable nature on suitable hardware, such as GPUs. MLPF, like every other large scale model, utilizes large compute resources in or- der to become efficient and accurate. Models and their datasets keep increasing in size, further expanding their need for more computer resources. The work in this thesis includes implementing a distributed version of the Graph Neural Network MLPF using the Horovod framework with the aim to scale the applications to exascale-class supercomputers. Horovod is a well-established framework for distributed workloads within the Artificial Intelligence field. Our work uses the Horovod framework to distribute the work across up to 292 su- percomputer nodes each with up to 4 GPUs, i.e. runs with up to over 1100 GPUs on the Jülich Juwels supercomputer. Our work focuses on experiments on the Nvidia Volta and Ampere architecture, as these were the best available to us during this thesis. During these scaling tests we observe that the performance scales well up to 24 nodes. In particular, we see a speedup of up to 20X and on 24 nodes, whereas the speedup reduces to 50X when going to 100 nodes. The thesis also includes comparisons of scaling performances between the Volta and Ampere GPU architectures. Our results also show that the difference between using Volta and Ampere GPUs diminishes as one scales to many nodes, a tend- ency noticed after only 8 nodes a difference becomes only 36 seconds per epoch, whereas the timings on a single node is 485 seconds and 200 seconds, respect- ively for a single Volta and Ampere GPU. Some suggestions for future work are also included.
id cern-2839735
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2022
publisher NTNU
record_format invenio
spelling cern-28397352022-11-08T22:14:31Zhttp://cds.cern.ch/record/2839735engSørlie, LarsPreparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputersComputing and ComputersThere is an increase in interest of showing the importance of HPC for Artificial In- telligence and Artificial Intelligence for HPC. HPC centers are now boasting larger and larger compute power, with more centers reaching exascale. CERN is invest- igating the usage of Artificial Intelligence and Machine Learning to augment or replace some of the traditional workflows within the LHC experiments. Advant- ages of Machine Learning and Artificial Intelligence is their highly parallelizable nature on suitable hardware, such as GPUs. MLPF, like every other large scale model, utilizes large compute resources in or- der to become efficient and accurate. Models and their datasets keep increasing in size, further expanding their need for more computer resources. The work in this thesis includes implementing a distributed version of the Graph Neural Network MLPF using the Horovod framework with the aim to scale the applications to exascale-class supercomputers. Horovod is a well-established framework for distributed workloads within the Artificial Intelligence field. Our work uses the Horovod framework to distribute the work across up to 292 su- percomputer nodes each with up to 4 GPUs, i.e. runs with up to over 1100 GPUs on the Jülich Juwels supercomputer. Our work focuses on experiments on the Nvidia Volta and Ampere architecture, as these were the best available to us during this thesis. During these scaling tests we observe that the performance scales well up to 24 nodes. In particular, we see a speedup of up to 20X and on 24 nodes, whereas the speedup reduces to 50X when going to 100 nodes. The thesis also includes comparisons of scaling performances between the Volta and Ampere GPU architectures. Our results also show that the difference between using Volta and Ampere GPUs diminishes as one scales to many nodes, a tend- ency noticed after only 8 nodes a difference becomes only 36 seconds per epoch, whereas the timings on a single node is 485 seconds and 200 seconds, respect- ively for a single Volta and Ampere GPU. Some suggestions for future work are also included.NTNUCERN-THESIS-2022-182oai:cds.cern.ch:28397352022-11-01
spellingShingle Computing and Computers
Sørlie, Lars
Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_full Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_fullStr Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_full_unstemmed Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_short Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_sort preparing the cern machine-learned particle-flow model for exascale using horovod: experience and performance studies on the flatiron and jülich supercomputers
topic Computing and Computers
url http://cds.cern.ch/record/2839735
work_keys_str_mv AT sørlielars preparingthecernmachinelearnedparticleflowmodelforexascaleusinghorovodexperienceandperformancestudiesontheflatironandjulichsupercomputers