Cargando…
Optimal errors and phase transitions in high-dimensional generalized linear models
Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural n...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431156/ https://www.ncbi.nlm.nih.gov/pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116 |
_version_ | 1783405893411405824 |
---|---|
author | Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka |
author_facet | Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka |
author_sort | Barbier, Jean |
collection | PubMed |
description | Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms. |
format | Online Article Text |
id | pubmed-6431156 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-64311562019-03-28 Optimal errors and phase transitions in high-dimensional generalized linear models Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka Proc Natl Acad Sci U S A PNAS Plus Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms. National Academy of Sciences 2019-03-19 2019-03-01 /pmc/articles/PMC6431156/ /pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116 Text en Copyright © 2019 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | PNAS Plus Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka Optimal errors and phase transitions in high-dimensional generalized linear models |
title | Optimal errors and phase transitions in high-dimensional generalized linear models |
title_full | Optimal errors and phase transitions in high-dimensional generalized linear models |
title_fullStr | Optimal errors and phase transitions in high-dimensional generalized linear models |
title_full_unstemmed | Optimal errors and phase transitions in high-dimensional generalized linear models |
title_short | Optimal errors and phase transitions in high-dimensional generalized linear models |
title_sort | optimal errors and phase transitions in high-dimensional generalized linear models |
topic | PNAS Plus |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431156/ https://www.ncbi.nlm.nih.gov/pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116 |
work_keys_str_mv | AT barbierjean optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT krzakalaflorent optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT macrisnicolas optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT miolaneleo optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT zdeborovalenka optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels |