Cargando…

Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD

Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topolog...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jameel, Mohsan, Jawed, Shayan, Schmidt-Thieme, Lars
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206308/ http://dx.doi.org/10.1007/978-3-030-47436-2_67

Descripción
Sumario:	Distributed training of deep learning models on high-latency systems necessitates decentralized parallel SGD solutions. However, existing solutions suffer from slow convergence because of hand-crafted topologies. The question arises, “for decentralized parallel SGD, is it possible to learn a topology that provides faster model averaging compared to the hand-crafted counterparts?”. By leveraging spectral properties of the graph, we formulate the objective function for finding the topology that provides fast model averaging. Since direct optimization of the objective function is infeasible, we employ a local search algorithm guided by the objective function. We show through extensive empirical evaluation on image classification tasks that the model averaging based on learned topologies leads to fast convergence. An equally important aspect of the decentralized parallel SGD is the link weights for sparse model averaging. In contrast to setting weights via Metropolis-Hastings, we propose to use Laplacian link weights on the learned topologies, which provide a significant lift in performance.

Optimal Topology Search for Fast Model Averaging in Decentralized Parallel SGD

Ejemplares similares