Cargando…

Prevalence of neural collapse during the terminal phase of deep learning training

Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Papyan, Vardan, Han, X. Y., Donoho, David L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547234/
https://www.ncbi.nlm.nih.gov/pubmed/32958680
http://dx.doi.org/10.1073/pnas.2015509117
_version_ 1783592383402737664
author Papyan, Vardan
Han, X. Y.
Donoho, David L.
author_facet Papyan, Vardan
Han, X. Y.
Donoho, David L.
author_sort Papyan, Vardan
collection PubMed
description Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier’s decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.
format Online
Article
Text
id pubmed-7547234
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-75472342020-10-22 Prevalence of neural collapse during the terminal phase of deep learning training Papyan, Vardan Han, X. Y. Donoho, David L. Proc Natl Acad Sci U S A Physical Sciences Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier’s decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability. National Academy of Sciences 2020-10-06 2020-09-21 /pmc/articles/PMC7547234/ /pubmed/32958680 http://dx.doi.org/10.1073/pnas.2015509117 Text en Copyright © 2020 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Papyan, Vardan
Han, X. Y.
Donoho, David L.
Prevalence of neural collapse during the terminal phase of deep learning training
title Prevalence of neural collapse during the terminal phase of deep learning training
title_full Prevalence of neural collapse during the terminal phase of deep learning training
title_fullStr Prevalence of neural collapse during the terminal phase of deep learning training
title_full_unstemmed Prevalence of neural collapse during the terminal phase of deep learning training
title_short Prevalence of neural collapse during the terminal phase of deep learning training
title_sort prevalence of neural collapse during the terminal phase of deep learning training
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547234/
https://www.ncbi.nlm.nih.gov/pubmed/32958680
http://dx.doi.org/10.1073/pnas.2015509117
work_keys_str_mv AT papyanvardan prevalenceofneuralcollapseduringtheterminalphaseofdeeplearningtraining
AT hanxy prevalenceofneuralcollapseduringtheterminalphaseofdeeplearningtraining
AT donohodavidl prevalenceofneuralcollapseduringtheterminalphaseofdeeplearningtraining