Cargando…

Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks

The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hesam, Ahmad, Vallecorsa, Sofia, Khattak, Gulrukh, Carminati, Federico
Lenguaje:	eng
Publicado:	2019
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1007/978-3-030-34356-9_32 http://cds.cern.ch/record/2799446

_version_	1780972544875036672
author	Hesam, Ahmad Vallecorsa, Sofia Khattak, Gulrukh Carminati, Federico
author_facet	Hesam, Ahmad Vallecorsa, Sofia Khattak, Gulrukh Carminati, Federico
author_sort	Hesam, Ahmad
collection	CERN
description	The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.
id	cern-2799446
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2019
record_format	invenio
spelling	cern-27994462022-01-13T20:45:54Zdoi:10.1007/978-3-030-34356-9_32http://cds.cern.ch/record/2799446engHesam, AhmadVallecorsa, SofiaKhattak, GulrukhCarminati, FedericoEvaluating POWER Architecture for Distributed Training of Generative Adversarial NetworksComputing and ComputersThe increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.oai:cds.cern.ch:27994462019
spellingShingle	Computing and Computers Hesam, Ahmad Vallecorsa, Sofia Khattak, Gulrukh Carminati, Federico Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title	Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_full	Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_fullStr	Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_full_unstemmed	Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_short	Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks
title_sort	evaluating power architecture for distributed training of generative adversarial networks
topic	Computing and Computers
url	https://dx.doi.org/10.1007/978-3-030-34356-9_32 http://cds.cern.ch/record/2799446
work_keys_str_mv	AT hesamahmad evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks AT vallecorsasofia evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks AT khattakgulrukh evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks AT carminatifederico evaluatingpowerarchitecturefordistributedtrainingofgenerativeadversarialnetworks

Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks

Ejemplares similares