Cargando…
Prevalence of neural collapse during the terminal phase of deep learning training
Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three pro...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547234/ https://www.ncbi.nlm.nih.gov/pubmed/32958680 http://dx.doi.org/10.1073/pnas.2015509117 |
_version_ | 1783592383402737664 |
---|---|
author | Papyan, Vardan Han, X. Y. Donoho, David L. |
author_facet | Papyan, Vardan Han, X. Y. Donoho, David L. |
author_sort | Papyan, Vardan |
collection | PubMed |
description | Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier’s decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability. |
format | Online Article Text |
id | pubmed-7547234 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-75472342020-10-22 Prevalence of neural collapse during the terminal phase of deep learning training Papyan, Vardan Han, X. Y. Donoho, David L. Proc Natl Acad Sci U S A Physical Sciences Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier’s decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability. National Academy of Sciences 2020-10-06 2020-09-21 /pmc/articles/PMC7547234/ /pubmed/32958680 http://dx.doi.org/10.1073/pnas.2015509117 Text en Copyright © 2020 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Physical Sciences Papyan, Vardan Han, X. Y. Donoho, David L. Prevalence of neural collapse during the terminal phase of deep learning training |
title | Prevalence of neural collapse during the terminal phase of deep learning training |
title_full | Prevalence of neural collapse during the terminal phase of deep learning training |
title_fullStr | Prevalence of neural collapse during the terminal phase of deep learning training |
title_full_unstemmed | Prevalence of neural collapse during the terminal phase of deep learning training |
title_short | Prevalence of neural collapse during the terminal phase of deep learning training |
title_sort | prevalence of neural collapse during the terminal phase of deep learning training |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547234/ https://www.ncbi.nlm.nih.gov/pubmed/32958680 http://dx.doi.org/10.1073/pnas.2015509117 |
work_keys_str_mv | AT papyanvardan prevalenceofneuralcollapseduringtheterminalphaseofdeeplearningtraining AT hanxy prevalenceofneuralcollapseduringtheterminalphaseofdeeplearningtraining AT donohodavidl prevalenceofneuralcollapseduringtheterminalphaseofdeeplearningtraining |