Cargando…
Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD
Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topolog...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206308/ http://dx.doi.org/10.1007/978-3-030-47436-2_67 |
_version_ | 1783530391328522240 |
---|---|
author | Jameel, Mohsan Jawed, Shayan Schmidt-Thieme, Lars |
author_facet | Jameel, Mohsan Jawed, Shayan Schmidt-Thieme, Lars |
author_sort | Jameel, Mohsan |
collection | PubMed |
description | Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topology that provides faster model averaging compared to the hand-crafted counterparts?”. By leveraging spectral properties of the graph, we formulate the objective function for finding the topology that provides fast model averaging. Since direct optimization of the objective function is infeasible, we employ a local search algorithm guided by the objective function. We show through extensive empirical evaluation on image classification tasks that the model averaging based on learned topologies leads to fast convergence. An equally important aspect of the decentralized parallel SGD is the link weights for sparse model averaging. In contrast to setting weights via Metropolis-Hastings, we propose to use Laplacian link weights on the learned topologies, which provide a significant lift in performance. |
format | Online Article Text |
id | pubmed-7206308 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72063082020-05-08 Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD Jameel, Mohsan Jawed, Shayan Schmidt-Thieme, Lars Advances in Knowledge Discovery and Data Mining Article Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topology that provides faster model averaging compared to the hand-crafted counterparts?”. By leveraging spectral properties of the graph, we formulate the objective function for finding the topology that provides fast model averaging. Since direct optimization of the objective function is infeasible, we employ a local search algorithm guided by the objective function. We show through extensive empirical evaluation on image classification tasks that the model averaging based on learned topologies leads to fast convergence. An equally important aspect of the decentralized parallel SGD is the link weights for sparse model averaging. In contrast to setting weights via Metropolis-Hastings, we propose to use Laplacian link weights on the learned topologies, which provide a significant lift in performance. 2020-04-17 /pmc/articles/PMC7206308/ http://dx.doi.org/10.1007/978-3-030-47436-2_67 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Jameel, Mohsan Jawed, Shayan Schmidt-Thieme, Lars Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD |
title | Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD |
title_full | Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD |
title_fullStr | Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD |
title_full_unstemmed | Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD |
title_short | Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD |
title_sort | optimal topology search for fast model averaging in decentralized parallel sgd |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206308/ http://dx.doi.org/10.1007/978-3-030-47436-2_67 |
work_keys_str_mv | AT jameelmohsan optimaltopologysearchforfastmodelaveragingindecentralizedparallelsgd AT jawedshayan optimaltopologysearchforfastmodelaveragingindecentralizedparallelsgd AT schmidtthiemelars optimaltopologysearchforfastmodelaveragingindecentralizedparallelsgd |