Cargando…

A mean field view of the landscape of two-layer neural networks

Multilayer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires optimizing a nonconvex high-dimensional objective (risk function), a problem that is usually attacked us...

Descripción completa

Detalles Bibliográficos
Autores principales: Mei, Song, Montanari, Andrea, Nguyen, Phan-Minh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099898/
https://www.ncbi.nlm.nih.gov/pubmed/30054315
http://dx.doi.org/10.1073/pnas.1806579115
_version_ 1783348758272016384
author Mei, Song
Montanari, Andrea
Nguyen, Phan-Minh
author_facet Mei, Song
Montanari, Andrea
Nguyen, Phan-Minh
author_sort Mei, Song
collection PubMed
description Multilayer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires optimizing a nonconvex high-dimensional objective (risk function), a problem that is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the former case, does this happen because local minima are absent or because SGD somehow avoids them? In the latter, why do local minima reached by SGD have good generalization properties? In this paper, we consider a simple case, namely two-layer neural networks, and prove that—in a suitable scaling limit—SGD dynamics is captured by a certain nonlinear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows for “averaging out” some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD.
format Online
Article
Text
id pubmed-6099898
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-60998982018-08-21 A mean field view of the landscape of two-layer neural networks Mei, Song Montanari, Andrea Nguyen, Phan-Minh Proc Natl Acad Sci U S A PNAS Plus Multilayer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires optimizing a nonconvex high-dimensional objective (risk function), a problem that is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the former case, does this happen because local minima are absent or because SGD somehow avoids them? In the latter, why do local minima reached by SGD have good generalization properties? In this paper, we consider a simple case, namely two-layer neural networks, and prove that—in a suitable scaling limit—SGD dynamics is captured by a certain nonlinear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows for “averaging out” some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD. National Academy of Sciences 2018-08-14 2018-07-27 /pmc/articles/PMC6099898/ /pubmed/30054315 http://dx.doi.org/10.1073/pnas.1806579115 Text en Copyright © 2018 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle PNAS Plus
Mei, Song
Montanari, Andrea
Nguyen, Phan-Minh
A mean field view of the landscape of two-layer neural networks
title A mean field view of the landscape of two-layer neural networks
title_full A mean field view of the landscape of two-layer neural networks
title_fullStr A mean field view of the landscape of two-layer neural networks
title_full_unstemmed A mean field view of the landscape of two-layer neural networks
title_short A mean field view of the landscape of two-layer neural networks
title_sort mean field view of the landscape of two-layer neural networks
topic PNAS Plus
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099898/
https://www.ncbi.nlm.nih.gov/pubmed/30054315
http://dx.doi.org/10.1073/pnas.1806579115
work_keys_str_mv AT meisong ameanfieldviewofthelandscapeoftwolayerneuralnetworks
AT montanariandrea ameanfieldviewofthelandscapeoftwolayerneuralnetworks
AT nguyenphanminh ameanfieldviewofthelandscapeoftwolayerneuralnetworks
AT meisong meanfieldviewofthelandscapeoftwolayerneuralnetworks
AT montanariandrea meanfieldviewofthelandscapeoftwolayerneuralnetworks
AT nguyenphanminh meanfieldviewofthelandscapeoftwolayerneuralnetworks