Cargando…

Optimal errors and phase transitions in high-dimensional generalized linear models

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural n...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barbier, Jean, Krzakala, Florent, Macris, Nicolas, Miolane, Léo, Zdeborová, Lenka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	National Academy of Sciences 2019
Materias:	PNAS Plus
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431156/ https://www.ncbi.nlm.nih.gov/pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116

_version_	1783405893411405824
author	Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka
author_facet	Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka
author_sort	Barbier, Jean
collection	PubMed
description	Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.
format	Online Article Text
id	pubmed-6431156
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	National Academy of Sciences
record_format	MEDLINE/PubMed
spelling	pubmed-64311562019-03-28 Optimal errors and phase transitions in high-dimensional generalized linear models Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka Proc Natl Acad Sci U S A PNAS Plus Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms. National Academy of Sciences 2019-03-19 2019-03-01 /pmc/articles/PMC6431156/ /pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116 Text en Copyright © 2019 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle	PNAS Plus Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka Optimal errors and phase transitions in high-dimensional generalized linear models
title	Optimal errors and phase transitions in high-dimensional generalized linear models
title_full	Optimal errors and phase transitions in high-dimensional generalized linear models
title_fullStr	Optimal errors and phase transitions in high-dimensional generalized linear models
title_full_unstemmed	Optimal errors and phase transitions in high-dimensional generalized linear models
title_short	Optimal errors and phase transitions in high-dimensional generalized linear models
title_sort	optimal errors and phase transitions in high-dimensional generalized linear models
topic	PNAS Plus
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431156/ https://www.ncbi.nlm.nih.gov/pubmed/30824595 http://dx.doi.org/10.1073/pnas.1802705116
work_keys_str_mv	AT barbierjean optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT krzakalaflorent optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT macrisnicolas optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT miolaneleo optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels AT zdeborovalenka optimalerrorsandphasetransitionsinhighdimensionalgeneralizedlinearmodels

Optimal errors and phase transitions in high-dimensional generalized linear models

Ejemplares similares