Cargando…

Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks

The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes...

Descripción completa

Detalles Bibliográficos
Autores principales: Hesam, Ahmad, Vallecorsa, Sofia, Khattak, Gulrukh, Carminati, Federico
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1007/978-3-030-34356-9_32
http://cds.cern.ch/record/2799446
_version_ 1780972544875036672
author Hesam, Ahmad
Vallecorsa, Sofia
Khattak, Gulrukh
Carminati, Federico
author_facet Hesam, Ahmad
Vallecorsa, Sofia
Khattak, Gulrukh
Carminati, Federico
author_sort Hesam, Ahmad
collection CERN
description The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.
id cern-2799446
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-27994462022-01-13T20:45:54Zdoi:10.1007/978-3-030-34356-9_32http://cds.cern.ch/record/2799446engHesam, AhmadVallecorsa, SofiaKhattak, GulrukhCarminati, FedericoEvaluating POWER Architecture for Distributed Training of Generative Adversarial NetworksComputing and ComputersThe increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.oai:cds.cern.ch:27994462019
spellingShingle Computing and Computers
Hesam, Ahmad
Vallecorsa, Sofia
Khattak, Gulrukh
Carminati, Federico
Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_full Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_fullStr Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_full_unstemmed Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_short Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_sort evaluating power architecture for distributed training of generative adversarial networks
topic Computing and Computers
url https://dx.doi.org/10.1007/978-3-030-34356-9_32
http://cds.cern.ch/record/2799446
work_keys_str_mv AT hesamahmad evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks
AT vallecorsasofia evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks
AT khattakgulrukh evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks
AT carminatifederico evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks