Cargando…
Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network,...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517082/ https://www.ncbi.nlm.nih.gov/pubmed/33286332 http://dx.doi.org/10.3390/e22050560 |
_version_ | 1783587148186779648 |
---|---|
author | Vasudevan, Shrihari |
author_facet | Vasudevan, Shrihari |
author_sort | Vasudevan, Shrihari |
collection | PubMed |
description | This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach. |
format | Online Article Text |
id | pubmed-7517082 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75170822020-11-09 Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks Vasudevan, Shrihari Entropy (Basel) Article This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach. MDPI 2020-05-17 /pmc/articles/PMC7517082/ /pubmed/33286332 http://dx.doi.org/10.3390/e22050560 Text en © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Vasudevan, Shrihari Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_full | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_fullStr | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_full_unstemmed | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_short | Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks |
title_sort | mutual information based learning rate decay for stochastic gradient descent training of deep neural networks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517082/ https://www.ncbi.nlm.nih.gov/pubmed/33286332 http://dx.doi.org/10.3390/e22050560 |
work_keys_str_mv | AT vasudevanshrihari mutualinformationbasedlearningratedecayforstochasticgradientdescenttrainingofdeepneuralnetworks |