Cargando…

Bayesian Data Selection

Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic—such as a subset of variables—that is we...

Descripción completa

Detalles Bibliográficos
Autores principales: Weinstein, Eli N., Miller, Jeffrey W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194814/
https://www.ncbi.nlm.nih.gov/pubmed/37206375
_version_ 1785044094512267264
author Weinstein, Eli N.
Miller, Jeffrey W.
author_facet Weinstein, Eli N.
Miller, Jeffrey W.
author_sort Weinstein, Eli N.
collection PubMed
description Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic—such as a subset of variables—that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the "Stein volume criterion (SVC)", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback–Leibler divergence. We prove that the SVC is consistent for data selection, and establish consistency and asymptotic normality of the corresponding generalized posterior on parameters. We apply the SVC to the analysis of single-cell RNA sequencing data sets using probabilistic principal components analysis and a spin glass model of gene regulation.
format Online
Article
Text
id pubmed-10194814
institution National Center for Biotechnology Information
language English
publishDate 2023
record_format MEDLINE/PubMed
spelling pubmed-101948142023-05-18 Bayesian Data Selection Weinstein, Eli N. Miller, Jeffrey W. J Mach Learn Res Article Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic—such as a subset of variables—that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the "Stein volume criterion (SVC)", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback–Leibler divergence. We prove that the SVC is consistent for data selection, and establish consistency and asymptotic normality of the corresponding generalized posterior on parameters. We apply the SVC to the analysis of single-cell RNA sequencing data sets using probabilistic principal components analysis and a spin glass model of gene regulation. 2023 /pmc/articles/PMC10194814/ /pubmed/37206375 Text en https://creativecommons.org/licenses/by/4.0/License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/21-1067.html (https://jmlr.org/papers/v24/21-1067.html)
spellingShingle Article
Weinstein, Eli N.
Miller, Jeffrey W.
Bayesian Data Selection
title Bayesian Data Selection
title_full Bayesian Data Selection
title_fullStr Bayesian Data Selection
title_full_unstemmed Bayesian Data Selection
title_short Bayesian Data Selection
title_sort bayesian data selection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194814/
https://www.ncbi.nlm.nih.gov/pubmed/37206375
work_keys_str_mv AT weinsteinelin bayesiandataselection
AT millerjeffreyw bayesiandataselection