Cargando…

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network,...

Descripción completa

Detalles Bibliográficos
Autor principal:	Vasudevan, Shrihari
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517082/ https://www.ncbi.nlm.nih.gov/pubmed/33286332 http://dx.doi.org/10.3390/e22050560

_version_	1783587148186779648
author	Vasudevan, Shrihari
author_facet	Vasudevan, Shrihari
author_sort	Vasudevan, Shrihari
collection	PubMed
description	This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.
format	Online Article Text
id	pubmed-7517082
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75170822020-11-09 Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks Vasudevan, Shrihari Entropy (Basel) Article This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach. MDPI 2020-05-17 /pmc/articles/PMC7517082/ /pubmed/33286332 http://dx.doi.org/10.3390/e22050560 Text en © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Vasudevan, Shrihari Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_fullStr	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_full_unstemmed	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_short	Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
title_sort	mutual information based learning rate decay for stochastic gradient descent training of deep neural networks
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517082/ https://www.ncbi.nlm.nih.gov/pubmed/33286332 http://dx.doi.org/10.3390/e22050560
work_keys_str_mv	AT vasudevanshrihari mutualinformationbasedlearningratedecayforstochasticgradientdescenttrainingofdeepneuralnetworks

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

Ejemplares similares