Cargando…
A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516401/ https://www.ncbi.nlm.nih.gov/pubmed/33285876 http://dx.doi.org/10.3390/e22010101 |
_version_ | 1783586992543498240 |
---|---|
author | Fioresi, Rita Chaudhari, Pratik Soatto, Stefano |
author_facet | Fioresi, Rita Chaudhari, Pratik Soatto, Stefano |
author_sort | Fioresi, Rita |
collection | PubMed |
description | This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former. |
format | Online Article Text |
id | pubmed-7516401 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75164012020-11-09 A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics Fioresi, Rita Chaudhari, Pratik Soatto, Stefano Entropy (Basel) Article This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former. MDPI 2020-01-15 /pmc/articles/PMC7516401/ /pubmed/33285876 http://dx.doi.org/10.3390/e22010101 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Fioresi, Rita Chaudhari, Pratik Soatto, Stefano A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_full | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_fullStr | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_full_unstemmed | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_short | A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics |
title_sort | geometric interpretation of stochastic gradient descent using diffusion metrics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516401/ https://www.ncbi.nlm.nih.gov/pubmed/33285876 http://dx.doi.org/10.3390/e22010101 |
work_keys_str_mv | AT fioresirita ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT chaudharipratik ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT soattostefano ageometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT fioresirita geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT chaudharipratik geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics AT soattostefano geometricinterpretationofstochasticgradientdescentusingdiffusionmetrics |