Cargando…

Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter setti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sahs, Justin, Pyle, Ryan, Damaraju, Aneel, Caro, Josue Ortega, Tavaslioglu, Onur, Lu, Andy, Anselmi, Fabio, Patel, Ankit B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9131019/ https://www.ncbi.nlm.nih.gov/pubmed/35647529 http://dx.doi.org/10.3389/frai.2022.889981

_version_	1784713097455337472
author	Sahs, Justin Pyle, Ryan Damaraju, Aneel Caro, Josue Ortega Tavaslioglu, Onur Lu, Andy Anselmi, Fabio Patel, Ankit B.
author_facet	Sahs, Justin Pyle, Ryan Damaraju, Aneel Caro, Josue Ortega Tavaslioglu, Onur Lu, Andy Anselmi, Fabio Patel, Ankit B.
author_sort	Sahs, Justin
collection	PubMed
description	Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.
format	Online Article Text
id	pubmed-9131019
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-91310192022-05-26 Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics Sahs, Justin Pyle, Ryan Damaraju, Aneel Caro, Josue Ortega Tavaslioglu, Onur Lu, Andy Anselmi, Fabio Patel, Ankit B. Front Artif Intell Artificial Intelligence Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2. Frontiers Media S.A. 2022-05-11 /pmc/articles/PMC9131019/ /pubmed/35647529 http://dx.doi.org/10.3389/frai.2022.889981 Text en Copyright © 2022 Sahs, Pyle, Damaraju, Caro, Tavaslioglu, Lu, Anselmi and Patel. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Sahs, Justin Pyle, Ryan Damaraju, Aneel Caro, Josue Ortega Tavaslioglu, Onur Lu, Andy Anselmi, Fabio Patel, Ankit B. Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
title	Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
title_full	Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
title_fullStr	Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
title_full_unstemmed	Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
title_short	Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics
title_sort	shallow univariate relu networks as splines: initialization, loss surface, hessian, and gradient flow dynamics
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9131019/ https://www.ncbi.nlm.nih.gov/pubmed/35647529 http://dx.doi.org/10.3389/frai.2022.889981
work_keys_str_mv	AT sahsjustin shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT pyleryan shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT damarajuaneel shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT carojosueortega shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT tavasliogluonur shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT luandy shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT anselmifabio shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics AT patelankitb shallowunivariaterelunetworksassplinesinitializationlosssurfacehessianandgradientflowdynamics

Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

Ejemplares similares