Cargando…

Optimal errors and phase transitions in high-dimensional generalized linear models

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural n...

Descripción completa

Detalles Bibliográficos
Autores principales: Barbier, Jean, Krzakala, Florent, Macris, Nicolas, Miolane, Léo, Zdeborová, Lenka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431156/
https://www.ncbi.nlm.nih.gov/pubmed/30824595
http://dx.doi.org/10.1073/pnas.1802705116
_version_ 1783405893411405824
author Barbier, Jean
Krzakala, Florent
Macris, Nicolas
Miolane, Léo
Zdeborová, Lenka
author_facet Barbier, Jean
Krzakala, Florent
Macris, Nicolas
Miolane, Léo
Zdeborová, Lenka
author_sort Barbier, Jean
collection PubMed
description Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.
format Online
Article
Text
id pubmed-6431156
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-64311562019-03-28 Optimal errors and phase transitions in high-dimensional generalized linear models Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka Proc Natl Acad Sci U S A PNAS Plus Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms. National Academy of Sciences 2019-03-19 2019-03-01 /pmc/articles/PMC6431156/ /pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116 Text en Copyright © 2019 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle PNAS Plus
Barbier, Jean
Krzakala, Florent
Macris, Nicolas
Miolane, Léo
Zdeborová, Lenka
Optimal errors and phase transitions in high-dimensional generalized linear models
title Optimal errors and phase transitions in high-dimensional generalized linear models
title_full Optimal errors and phase transitions in high-dimensional generalized linear models
title_fullStr Optimal errors and phase transitions in high-dimensional generalized linear models
title_full_unstemmed Optimal errors and phase transitions in high-dimensional generalized linear models
title_short Optimal errors and phase transitions in high-dimensional generalized linear models
title_sort optimal errors and phase transitions in high-dimensional generalized linear models
topic PNAS Plus
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431156/
https://www.ncbi.nlm.nih.gov/pubmed/30824595
http://dx.doi.org/10.1073/pnas.1802705116
work_keys_str_mv AT barbierjean optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels
AT krzakalaflorent optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels
AT macrisnicolas optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels
AT miolaneleo optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels
AT zdeborovalenka optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels