Cargando…

Proactive Congestion Avoidance for Distributed Deep Learning

This paper presents “Proactive Congestion Notification” (PCN), a congestion-avoidance technique for distributed deep learning (DDL). DDL is widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training input...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kang, Minkoo, Yang, Gyeongsik, Yoo, Yeonho, Yoo, Chuck
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7796356/ https://www.ncbi.nlm.nih.gov/pubmed/33383840 http://dx.doi.org/10.3390/s21010174

_version_	1783634662921338880
author	Kang, Minkoo Yang, Gyeongsik Yoo, Yeonho Yoo, Chuck
author_facet	Kang, Minkoo Yang, Gyeongsik Yoo, Yeonho Yoo, Chuck
author_sort	Kang, Minkoo
collection	PubMed
description	This paper presents “Proactive Congestion Notification” (PCN), a congestion-avoidance technique for distributed deep learning (DDL). DDL is widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training inputs and synchronizes the model gradients at the end of each iteration. However, it is well known that the network communication for synchronizing model parameters is the main bottleneck in DDL. Our key observation is that the DDL architecture makes each worker generate burst traffic every iteration, which causes network congestion and in turn degrades the throughput of DDL traffic. Based on this observation, the key idea behind PCN is to prevent potential congestion by proactively regulating the switch queue length before DDL burst traffic arrives at the switch, which prepares the switches for handling incoming DDL bursts. In our evaluation, PCN improves the throughput of DDL traffic by 72% on average.
format	Online Article Text
id	pubmed-7796356
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77963562021-01-10 Proactive Congestion Avoidance for Distributed Deep Learning Kang, Minkoo Yang, Gyeongsik Yoo, Yeonho Yoo, Chuck Sensors (Basel) Article This paper presents “Proactive Congestion Notification” (PCN), a congestion-avoidance technique for distributed deep learning (DDL). DDL is widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training inputs and synchronizes the model gradients at the end of each iteration. However, it is well known that the network communication for synchronizing model parameters is the main bottleneck in DDL. Our key observation is that the DDL architecture makes each worker generate burst traffic every iteration, which causes network congestion and in turn degrades the throughput of DDL traffic. Based on this observation, the key idea behind PCN is to prevent potential congestion by proactively regulating the switch queue length before DDL burst traffic arrives at the switch, which prepares the switches for handling incoming DDL bursts. In our evaluation, PCN improves the throughput of DDL traffic by 72% on average. MDPI 2020-12-29 /pmc/articles/PMC7796356/ /pubmed/33383840 http://dx.doi.org/10.3390/s21010174 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kang, Minkoo Yang, Gyeongsik Yoo, Yeonho Yoo, Chuck Proactive Congestion Avoidance for Distributed Deep Learning
title	Proactive Congestion Avoidance for Distributed Deep Learning
title_full	Proactive Congestion Avoidance for Distributed Deep Learning
title_fullStr	Proactive Congestion Avoidance for Distributed Deep Learning
title_full_unstemmed	Proactive Congestion Avoidance for Distributed Deep Learning
title_short	Proactive Congestion Avoidance for Distributed Deep Learning
title_sort	proactive congestion avoidance for distributed deep learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7796356/ https://www.ncbi.nlm.nih.gov/pubmed/33383840 http://dx.doi.org/10.3390/s21010174
work_keys_str_mv	AT kangminkoo proactivecongestionavoidancefordistributeddeeplearning AT yanggyeongsik proactivecongestionavoidancefordistributeddeeplearning AT yooyeonho proactivecongestionavoidancefordistributeddeeplearning AT yoochuck proactivecongestionavoidancefordistributeddeeplearning

Proactive Congestion Avoidance for Distributed Deep Learning

Ejemplares similares