Cargando…

Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels

A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. Its success is largely attributed to the GP’s analytical tractability, robustness, and natural inclusion of uncertainty quantification. Unfortunately, the use...

Descripción completa

Detalles Bibliográficos
Autores principales:	Noack, Marcus M., Krishnan, Harinarayan, Risser, Mark D., Reyes, Kristofer G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10011418/ https://www.ncbi.nlm.nih.gov/pubmed/36914705 http://dx.doi.org/10.1038/s41598-023-30062-8

_version_	1784906388100612096
author	Noack, Marcus M. Krishnan, Harinarayan Risser, Mark D. Reyes, Kristofer G.
author_facet	Noack, Marcus M. Krishnan, Harinarayan Risser, Mark D. Reyes, Kristofer G.
author_sort	Noack, Marcus M.
collection	PubMed
description	A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. Its success is largely attributed to the GP’s analytical tractability, robustness, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of [Formula: see text] in computation and [Formula: see text] in storage. All existing methods addressing this issue utilize some form of approximation—usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user’s flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover—instead of induce—sparse structure. The premise of this paper is that the data sets and physical processes modeled by GPs often exhibit natural or implicit sparsities, but commonly-used kernels do not allow us to exploit such sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
format	Online Article Text
id	pubmed-10011418
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-100114182023-03-15 Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels Noack, Marcus M. Krishnan, Harinarayan Risser, Mark D. Reyes, Kristofer G. Sci Rep Article A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. Its success is largely attributed to the GP’s analytical tractability, robustness, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of [Formula: see text] in computation and [Formula: see text] in storage. All existing methods addressing this issue utilize some form of approximation—usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user’s flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover—instead of induce—sparse structure. The premise of this paper is that the data sets and physical processes modeled by GPs often exhibit natural or implicit sparsities, but commonly-used kernels do not allow us to exploit such sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points. Nature Publishing Group UK 2023-03-13 /pmc/articles/PMC10011418/ /pubmed/36914705 http://dx.doi.org/10.1038/s41598-023-30062-8 Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Noack, Marcus M. Krishnan, Harinarayan Risser, Mark D. Reyes, Kristofer G. Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
title	Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
title_full	Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
title_fullStr	Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
title_full_unstemmed	Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
title_short	Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
title_sort	exact gaussian processes for massive datasets via non-stationary sparsity-discovering kernels
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10011418/ https://www.ncbi.nlm.nih.gov/pubmed/36914705 http://dx.doi.org/10.1038/s41598-023-30062-8
work_keys_str_mv	AT noackmarcusm exactgaussianprocessesformassivedatasetsvianonstationarysparsitydiscoveringkernels AT krishnanharinarayan exactgaussianprocessesformassivedatasetsvianonstationarysparsitydiscoveringkernels AT rissermarkd exactgaussianprocessesformassivedatasetsvianonstationarysparsitydiscoveringkernels AT reyeskristoferg exactgaussianprocessesformassivedatasetsvianonstationarysparsitydiscoveringkernels

Exact Gaussian processes for massive datasets via non-stationary sparsity-discovering kernels

Ejemplares similares