Cargando…

Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers

There is an increase in interest of showing the importance of HPC for Artificial In- telligence and Artificial Intelligence for HPC. HPC centers are now boasting larger and larger compute power, with more centers reaching exascale. CERN is invest- igating the usage of Artificial Intelligence and Mac...

Descripción completa

Detalles Bibliográficos
Autor principal:	Sørlie, Lars
Lenguaje:	eng
Publicado:	NTNU 2022
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/2839735

_version_	1780975978661543936
author	Sørlie, Lars
author_facet	Sørlie, Lars
author_sort	Sørlie, Lars
collection	CERN
description	There is an increase in interest of showing the importance of HPC for Artificial In- telligence and Artificial Intelligence for HPC. HPC centers are now boasting larger and larger compute power, with more centers reaching exascale. CERN is invest- igating the usage of Artificial Intelligence and Machine Learning to augment or replace some of the traditional workflows within the LHC experiments. Advant- ages of Machine Learning and Artificial Intelligence is their highly parallelizable nature on suitable hardware, such as GPUs. MLPF, like every other large scale model, utilizes large compute resources in or- der to become efficient and accurate. Models and their datasets keep increasing in size, further expanding their need for more computer resources. The work in this thesis includes implementing a distributed version of the Graph Neural Network MLPF using the Horovod framework with the aim to scale the applications to exascale-class supercomputers. Horovod is a well-established framework for distributed workloads within the Artificial Intelligence field. Our work uses the Horovod framework to distribute the work across up to 292 su- percomputer nodes each with up to 4 GPUs, i.e. runs with up to over 1100 GPUs on the Jülich Juwels supercomputer. Our work focuses on experiments on the Nvidia Volta and Ampere architecture, as these were the best available to us during this thesis. During these scaling tests we observe that the performance scales well up to 24 nodes. In particular, we see a speedup of up to 20X and on 24 nodes, whereas the speedup reduces to 50X when going to 100 nodes. The thesis also includes comparisons of scaling performances between the Volta and Ampere GPU architectures. Our results also show that the difference between using Volta and Ampere GPUs diminishes as one scales to many nodes, a tend- ency noticed after only 8 nodes a difference becomes only 36 seconds per epoch, whereas the timings on a single node is 485 seconds and 200 seconds, respect- ively for a single Volta and Ampere GPU. Some suggestions for future work are also included.
id	cern-2839735
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2022
publisher	NTNU
record_format	invenio
spelling	cern-28397352022-11-08T22:14:31Zhttp://cds.cern.ch/record/2839735engSørlie, LarsPreparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputersComputing and ComputersThere is an increase in interest of showing the importance of HPC for Artificial In- telligence and Artificial Intelligence for HPC. HPC centers are now boasting larger and larger compute power, with more centers reaching exascale. CERN is invest- igating the usage of Artificial Intelligence and Machine Learning to augment or replace some of the traditional workflows within the LHC experiments. Advant- ages of Machine Learning and Artificial Intelligence is their highly parallelizable nature on suitable hardware, such as GPUs. MLPF, like every other large scale model, utilizes large compute resources in or- der to become efficient and accurate. Models and their datasets keep increasing in size, further expanding their need for more computer resources. The work in this thesis includes implementing a distributed version of the Graph Neural Network MLPF using the Horovod framework with the aim to scale the applications to exascale-class supercomputers. Horovod is a well-established framework for distributed workloads within the Artificial Intelligence field. Our work uses the Horovod framework to distribute the work across up to 292 su- percomputer nodes each with up to 4 GPUs, i.e. runs with up to over 1100 GPUs on the Jülich Juwels supercomputer. Our work focuses on experiments on the Nvidia Volta and Ampere architecture, as these were the best available to us during this thesis. During these scaling tests we observe that the performance scales well up to 24 nodes. In particular, we see a speedup of up to 20X and on 24 nodes, whereas the speedup reduces to 50X when going to 100 nodes. The thesis also includes comparisons of scaling performances between the Volta and Ampere GPU architectures. Our results also show that the difference between using Volta and Ampere GPUs diminishes as one scales to many nodes, a tend- ency noticed after only 8 nodes a difference becomes only 36 seconds per epoch, whereas the timings on a single node is 485 seconds and 200 seconds, respect- ively for a single Volta and Ampere GPU. Some suggestions for future work are also included.NTNUCERN-THESIS-2022-182oai:cds.cern.ch:28397352022-11-01
spellingShingle	Computing and Computers Sørlie, Lars Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title	Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_full	Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_fullStr	Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_full_unstemmed	Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_short	Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers
title_sort	preparing the cern machine-learned particle-flow model for exascale using horovod: experience and performance studies on the flatiron and jülich supercomputers
topic	Computing and Computers
url	http://cds.cern.ch/record/2839735
work_keys_str_mv	AT sørlielars preparingthecernmachinelearnedparticleflowmodelforexascaleusinghorovodexperienceandperformancestudiesontheflatironandjulichsupercomputers

Preparing the CERN machine-learned particle-flow model for Exascale using Horovod: Experience and performance studies on the Flatiron and Jülich supercomputers

Ejemplares similares