Cargando…

A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivat...

Descripción completa

Detalles Bibliográficos
Autores principales: Fioresi, Rita, Chaudhari, Pratik, Soatto, Stefano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516401/
https://www.ncbi.nlm.nih.gov/pubmed/33285876
http://dx.doi.org/10.3390/e22010101
_version_ 1783586992543498240
author Fioresi, Rita
Chaudhari, Pratik
Soatto, Stefano
author_facet Fioresi, Rita
Chaudhari, Pratik
Soatto, Stefano
author_sort Fioresi, Rita
collection PubMed
description This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.
format Online
Article
Text
id pubmed-7516401
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75164012020-11-09 A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics Fioresi, Rita Chaudhari, Pratik Soatto, Stefano Entropy (Basel) Article This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former. MDPI 2020-01-15 /pmc/articles/PMC7516401/ /pubmed/33285876 http://dx.doi.org/10.3390/e22010101 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fioresi, Rita
Chaudhari, Pratik
Soatto, Stefano
A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_full A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_fullStr A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_full_unstemmed A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_short A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
title_sort geometric interpretation of stochastic gradient descent using diffusion metrics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516401/
https://www.ncbi.nlm.nih.gov/pubmed/33285876
http://dx.doi.org/10.3390/e22010101
work_keys_str_mv AT fioresirita ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT chaudharipratik ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT soattostefano ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT fioresirita geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT chaudharipratik geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics
AT soattostefano geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics