Cargando…
Accelerating DNN Training Through Selective Localized Learning
Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stoch...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8787307/ https://www.ncbi.nlm.nih.gov/pubmed/35087370 http://dx.doi.org/10.3389/fnins.2021.759807 |
_version_ | 1784639333468209152 |
---|---|
author | Krithivasan, Sarada Sen, Sanchari Venkataramani, Swagath Raghunathan, Anand |
author_facet | Krithivasan, Sarada Sen, Sanchari Venkataramani, Swagath Raghunathan, Anand |
author_sort | Krithivasan, Sarada |
collection | PubMed |
description | Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stochastic Gradient Descent (SGD) based training framework. Back-propagation is a computationally expensive process that requires 2 Generalized Matrix Multiply (GEMM) operations to compute the error and weight gradients for each layer. We alleviate this by selectively updating some layers' weights using localized learning rules that require only 1 GEMM operation per layer. Further, since localized weight updates are performed during the forward pass itself, the layer activations for such layers do not need to be stored until the backward pass, resulting in a reduced memory footprint. Localized updates can substantially boost training speed, but need to be used judiciously in order to preserve accuracy and convergence. We address this challenge through a Learning Mode Selection Algorithm, which gradually selects and moves layers to localized learning as training progresses. Specifically, for each epoch, the algorithm identifies a Localized→SGD transition layer that delineates the network into two regions. Layers before the transition layer use localized updates, while the transition layer and later layers use gradient-based updates. We propose both static and dynamic approaches to the design of the learning mode selection algorithm. The static algorithm utilizes a pre-defined scheduler function to identify the position of the transition layer, while the dynamic algorithm analyzes the dynamics of the weight updates made to the transition layer to determine how the boundary between SGD and localized updates is shifted in future epochs. We also propose a low-cost weak supervision mechanism that controls the learning rate of localized updates based on the overall training loss. We applied LoCal+SGD to 8 image recognition CNNs (including ResNet50 and MobileNetV2) across 3 datasets (Cifar10, Cifar100, and ImageNet). Our measurements on an Nvidia GTX 1080Ti GPU demonstrate upto 1.5× improvement in end-to-end training time with ~0.5% loss in Top-1 classification accuracy. |
format | Online Article Text |
id | pubmed-8787307 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-87873072022-01-26 Accelerating DNN Training Through Selective Localized Learning Krithivasan, Sarada Sen, Sanchari Venkataramani, Swagath Raghunathan, Anand Front Neurosci Neuroscience Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stochastic Gradient Descent (SGD) based training framework. Back-propagation is a computationally expensive process that requires 2 Generalized Matrix Multiply (GEMM) operations to compute the error and weight gradients for each layer. We alleviate this by selectively updating some layers' weights using localized learning rules that require only 1 GEMM operation per layer. Further, since localized weight updates are performed during the forward pass itself, the layer activations for such layers do not need to be stored until the backward pass, resulting in a reduced memory footprint. Localized updates can substantially boost training speed, but need to be used judiciously in order to preserve accuracy and convergence. We address this challenge through a Learning Mode Selection Algorithm, which gradually selects and moves layers to localized learning as training progresses. Specifically, for each epoch, the algorithm identifies a Localized→SGD transition layer that delineates the network into two regions. Layers before the transition layer use localized updates, while the transition layer and later layers use gradient-based updates. We propose both static and dynamic approaches to the design of the learning mode selection algorithm. The static algorithm utilizes a pre-defined scheduler function to identify the position of the transition layer, while the dynamic algorithm analyzes the dynamics of the weight updates made to the transition layer to determine how the boundary between SGD and localized updates is shifted in future epochs. We also propose a low-cost weak supervision mechanism that controls the learning rate of localized updates based on the overall training loss. We applied LoCal+SGD to 8 image recognition CNNs (including ResNet50 and MobileNetV2) across 3 datasets (Cifar10, Cifar100, and ImageNet). Our measurements on an Nvidia GTX 1080Ti GPU demonstrate upto 1.5× improvement in end-to-end training time with ~0.5% loss in Top-1 classification accuracy. Frontiers Media S.A. 2022-01-11 /pmc/articles/PMC8787307/ /pubmed/35087370 http://dx.doi.org/10.3389/fnins.2021.759807 Text en Copyright © 2022 Krithivasan, Sen, Venkataramani and Raghunathan. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Krithivasan, Sarada Sen, Sanchari Venkataramani, Swagath Raghunathan, Anand Accelerating DNN Training Through Selective Localized Learning |
title | Accelerating DNN Training Through Selective Localized Learning |
title_full | Accelerating DNN Training Through Selective Localized Learning |
title_fullStr | Accelerating DNN Training Through Selective Localized Learning |
title_full_unstemmed | Accelerating DNN Training Through Selective Localized Learning |
title_short | Accelerating DNN Training Through Selective Localized Learning |
title_sort | accelerating dnn training through selective localized learning |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8787307/ https://www.ncbi.nlm.nih.gov/pubmed/35087370 http://dx.doi.org/10.3389/fnins.2021.759807 |
work_keys_str_mv | AT krithivasansarada acceleratingdnntrainingthroughselectivelocalizedlearning AT sensanchari acceleratingdnntrainingthroughselectivelocalizedlearning AT venkataramaniswagath acceleratingdnntrainingthroughselectivelocalizedlearning AT raghunathananand acceleratingdnntrainingthroughselectivelocalizedlearning |