Cargando…

Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD

Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topolog...

Descripción completa

Detalles Bibliográficos
Autores principales: Jameel, Mohsan, Jawed, Shayan, Schmidt-Thieme, Lars
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206308/
http://dx.doi.org/10.1007/978-3-030-47436-2_67
_version_ 1783530391328522240
author Jameel, Mohsan
Jawed, Shayan
Schmidt-Thieme, Lars
author_facet Jameel, Mohsan
Jawed, Shayan
Schmidt-Thieme, Lars
author_sort Jameel, Mohsan
collection PubMed
description Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topology that provides faster model averaging compared to the hand-crafted counterparts?”. By leveraging spectral properties of the graph, we formulate the objective function for finding the topology that provides fast model averaging. Since direct optimization of the objective function is infeasible, we employ a local search algorithm guided by the objective function. We show through extensive empirical evaluation on image classification tasks that the model averaging based on learned topologies leads to fast convergence. An equally important aspect of the decentralized parallel SGD is the link weights for sparse model averaging. In contrast to setting weights via Metropolis-Hastings, we propose to use Laplacian link weights on the learned topologies, which provide a significant lift in performance.
format Online
Article
Text
id pubmed-7206308
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72063082020-05-08 Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD Jameel, Mohsan Jawed, Shayan Schmidt-Thieme, Lars Advances in Knowledge Discovery and Data Mining Article Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topology that provides faster model averaging compared to the hand-crafted counterparts?”. By leveraging spectral properties of the graph, we formulate the objective function for finding the topology that provides fast model averaging. Since direct optimization of the objective function is infeasible, we employ a local search algorithm guided by the objective function. We show through extensive empirical evaluation on image classification tasks that the model averaging based on learned topologies leads to fast convergence. An equally important aspect of the decentralized parallel SGD is the link weights for sparse model averaging. In contrast to setting weights via Metropolis-Hastings, we propose to use Laplacian link weights on the learned topologies, which provide a significant lift in performance. 2020-04-17 /pmc/articles/PMC7206308/ http://dx.doi.org/10.1007/978-3-030-47436-2_67 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Jameel, Mohsan
Jawed, Shayan
Schmidt-Thieme, Lars
Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
title Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
title_full Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
title_fullStr Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
title_full_unstemmed Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
title_short Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
title_sort optimal topology search for fast model averaging in decentralized parallel sgd
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206308/
http://dx.doi.org/10.1007/978-3-030-47436-2_67
work_keys_str_mv AT jameelmohsan optimaltopologysearchforfastmodelaveragingindecentralizedparallelsgd
AT jawedshayan optimaltopologysearchforfastmodelaveragingindecentralizedparallelsgd
AT schmidtthiemelars optimaltopologysearchforfastmodelaveragingindecentralizedparallelsgd