Cargando…

Stochastic quasi-gradient methods: variance reduction via Jacobian sketching

We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method—JacSketch—is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gower, Robert M., Richtárik, Peter, Bach, Francis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2020
Materias:	Full Length Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550794/ https://www.ncbi.nlm.nih.gov/pubmed/34720193 http://dx.doi.org/10.1007/s10107-020-01506-0

_version_	1784591032468373504
author	Gower, Robert M. Richtárik, Peter Bach, Francis
author_facet	Gower, Robert M. Richtárik, Peter Bach, Francis
author_sort	Gower, Robert M.
collection	PubMed
description	We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method—JacSketch—is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. Our method can also be seen as stochastic gradient descent applied to a controlled stochastic optimization reformulation of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a stochastic Lyapunov function. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al. (Proceedings of the eighteenth international conference on artificial intelligence and statistics, AISTATS 2015, San Diego, California, 2015). The rates we obtain for minibatch SAGA are also superior to existing rates and are sufficiently tight as to show a decrease in total complexity as the minibatch size increases. Moreover, we obtain the first minibatch SAGA method with importance sampling.
format	Online Article Text
id	pubmed-8550794
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-85507942021-10-29 Stochastic quasi-gradient methods: variance reduction via Jacobian sketching Gower, Robert M. Richtárik, Peter Bach, Francis Math Program Full Length Paper We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method—JacSketch—is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a stochastic quasi-gradient method. Our method can also be seen as stochastic gradient descent applied to a controlled stochastic optimization reformulation of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a stochastic Lyapunov function. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al. (Proceedings of the eighteenth international conference on artificial intelligence and statistics, AISTATS 2015, San Diego, California, 2015). The rates we obtain for minibatch SAGA are also superior to existing rates and are sufficiently tight as to show a decrease in total complexity as the minibatch size increases. Moreover, we obtain the first minibatch SAGA method with importance sampling. Springer Berlin Heidelberg 2020-05-12 2021 /pmc/articles/PMC8550794/ /pubmed/34720193 http://dx.doi.org/10.1007/s10107-020-01506-0 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Full Length Paper Gower, Robert M. Richtárik, Peter Bach, Francis Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
title	Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
title_full	Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
title_fullStr	Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
title_full_unstemmed	Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
title_short	Stochastic quasi-gradient methods: variance reduction via Jacobian sketching
title_sort	stochastic quasi-gradient methods: variance reduction via jacobian sketching
topic	Full Length Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550794/ https://www.ncbi.nlm.nih.gov/pubmed/34720193 http://dx.doi.org/10.1007/s10107-020-01506-0
work_keys_str_mv	AT gowerrobertm stochasticquasigradientmethodsvariancereductionviajacobiansketching AT richtarikpeter stochasticquasigradientmethodsvariancereductionviajacobiansketching AT bachfrancis stochasticquasigradientmethodsvariancereductionviajacobiansketching

Stochastic quasi-gradient methods: variance reduction via Jacobian sketching

Ejemplares similares