Cargando…

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network,...

Descripción completa

Detalles Bibliográficos
Autor principal: Vasudevan, Shrihari
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517082/
https://www.ncbi.nlm.nih.gov/pubmed/33286332
http://dx.doi.org/10.3390/e22050560
_version_ 1783587148186779648
author Vasudevan, Shrihari
author_facet Vasudevan, Shrihari
author_sort Vasudevan, Shrihari
collection PubMed
description This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.
format Online
Article
Text
id pubmed-7517082
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75170822020-11-09 Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks Vasudevan, Shrihari Entropy (Basel) Article This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach. MDPI 2020-05-17 /pmc/articles/PMC7517082/ /pubmed/33286332 http://dx.doi.org/10.3390/e22050560 Text en © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vasudevan, Shrihari
Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_fullStr Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full_unstemmed Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_short Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_sort mutual information based learning rate decay for stochastic gradient descent training of deep neural networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517082/
https://www.ncbi.nlm.nih.gov/pubmed/33286332
http://dx.doi.org/10.3390/e22050560
work_keys_str_mv AT vasudevanshrihari mutualinformationbasedlearningratedecayforstochasticgradientdescenttrainingofdeepneuralnetworks