Cargando…

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Mengjia, Rangamani, Akshay, Liao, Qianli, Galanti, Tomer, Poggio, Tomaso
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	AAAS 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10202460/ https://www.ncbi.nlm.nih.gov/pubmed/37223467 http://dx.doi.org/10.34133/research.0024

_version_	1785045442986246144
author	Xu, Mengjia Rangamani, Akshay Liao, Qianli Galanti, Tomer Poggio, Tomaso
author_facet	Xu, Mengjia Rangamani, Akshay Liao, Qianli Galanti, Tomer Poggio, Tomaso
author_sort	Xu, Mengjia
collection	PubMed
description	We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum ρ, which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is ρ. In particular, we derive novel norm-based bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. The same analysis predicts the existence of an inherent stochastic gradient descent noise for deep networks. In both cases, we verify our predictions experimentally. We then predict neural collapse and its properties without any specific assumption—unlike other published proofs. Our analysis supports the idea that the advantage of deep networks relative to other classifiers is greater for problems that are appropriate for sparse deep architectures such as convolutional neural networks. The reason is that compositionally sparse target functions can be approximated well by “sparse” deep networks without incurring in the curse of dimensionality.
format	Online Article Text
id	pubmed-10202460
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	AAAS
record_format	MEDLINE/PubMed
spelling	pubmed-102024602023-05-23 Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds Xu, Mengjia Rangamani, Akshay Liao, Qianli Galanti, Tomer Poggio, Tomaso Research (Wash D C) Research Article We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum ρ, which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is ρ. In particular, we derive novel norm-based bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. The same analysis predicts the existence of an inherent stochastic gradient descent noise for deep networks. In both cases, we verify our predictions experimentally. We then predict neural collapse and its properties without any specific assumption—unlike other published proofs. Our analysis supports the idea that the advantage of deep networks relative to other classifiers is greater for problems that are appropriate for sparse deep architectures such as convolutional neural networks. The reason is that compositionally sparse target functions can be approximated well by “sparse” deep networks without incurring in the curse of dimensionality. AAAS 2023-03-08 /pmc/articles/PMC10202460/ /pubmed/37223467 http://dx.doi.org/10.34133/research.0024 Text en Copyright © 2023 Mengjia Xu et al. https://creativecommons.org/licenses/by/4.0/Exclusive licensee Science and Technology Review Publishing House. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Research Article Xu, Mengjia Rangamani, Akshay Liao, Qianli Galanti, Tomer Poggio, Tomaso Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
title	Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
title_full	Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
title_fullStr	Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
title_full_unstemmed	Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
title_short	Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds
title_sort	dynamics in deep classifiers trained with the square loss: normalization, low rank, neural collapse, and generalization bounds
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10202460/ https://www.ncbi.nlm.nih.gov/pubmed/37223467 http://dx.doi.org/10.34133/research.0024
work_keys_str_mv	AT xumengjia dynamicsindeepclassifierstrainedwiththesquarelossnormalizationlowrankneuralcollapseandgeneralizationbounds AT rangamaniakshay dynamicsindeepclassifierstrainedwiththesquarelossnormalizationlowrankneuralcollapseandgeneralizationbounds AT liaoqianli dynamicsindeepclassifierstrainedwiththesquarelossnormalizationlowrankneuralcollapseandgeneralizationbounds AT galantitomer dynamicsindeepclassifierstrainedwiththesquarelossnormalizationlowrankneuralcollapseandgeneralizationbounds AT poggiotomaso dynamicsindeepclassifierstrainedwiththesquarelossnormalizationlowrankneuralcollapseandgeneralizationbounds

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

Ejemplares similares