Cargando…

Coresets for the Average Case Error for Finite Query Sets

Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural rela...

Descripción completa

Detalles Bibliográficos
Autores principales:	Maalouf, Alaa, Jubran, Ibrahim, Tukan, Murad, Feldman, Dan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8512382/ https://www.ncbi.nlm.nih.gov/pubmed/34641008 http://dx.doi.org/10.3390/s21196689

_version_	1784582977582268416
author	Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan
author_facet	Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan
author_sort	Maalouf, Alaa
collection	PubMed
description	Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for [Formula: see text] , where [Formula: see text] is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided.
format	Online Article Text
id	pubmed-8512382
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-85123822021-10-14 Coresets for the Average Case Error for Finite Query Sets Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan Sensors (Basel) Article Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for [Formula: see text] , where [Formula: see text] is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided. MDPI 2021-10-08 /pmc/articles/PMC8512382/ /pubmed/34641008 http://dx.doi.org/10.3390/s21196689 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan Coresets for the Average Case Error for Finite Query Sets
title	Coresets for the Average Case Error for Finite Query Sets
title_full	Coresets for the Average Case Error for Finite Query Sets
title_fullStr	Coresets for the Average Case Error for Finite Query Sets
title_full_unstemmed	Coresets for the Average Case Error for Finite Query Sets
title_short	Coresets for the Average Case Error for Finite Query Sets
title_sort	coresets for the average case error for finite query sets
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8512382/ https://www.ncbi.nlm.nih.gov/pubmed/34641008 http://dx.doi.org/10.3390/s21196689
work_keys_str_mv	AT maaloufalaa coresetsfortheaveragecaseerrorforfinitequerysets AT jubranibrahim coresetsfortheaveragecaseerrorforfinitequerysets AT tukanmurad coresetsfortheaveragecaseerrorforfinitequerysets AT feldmandan coresetsfortheaveragecaseerrorforfinitequerysets

Coresets for the Average Case Error for Finite Query Sets

Ejemplares similares