Cargando…

Coresets for the Average Case Error for Finite Query Sets

Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural rela...

Descripción completa

Detalles Bibliográficos
Autores principales: Maalouf, Alaa, Jubran, Ibrahim, Tukan, Murad, Feldman, Dan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8512382/
https://www.ncbi.nlm.nih.gov/pubmed/34641008
http://dx.doi.org/10.3390/s21196689
_version_ 1784582977582268416
author Maalouf, Alaa
Jubran, Ibrahim
Tukan, Murad
Feldman, Dan
author_facet Maalouf, Alaa
Jubran, Ibrahim
Tukan, Murad
Feldman, Dan
author_sort Maalouf, Alaa
collection PubMed
description Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for [Formula: see text] , where [Formula: see text] is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided.
format Online
Article
Text
id pubmed-8512382
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85123822021-10-14 Coresets for the Average Case Error for Finite Query Sets Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan Sensors (Basel) Article Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for [Formula: see text] , where [Formula: see text] is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided. MDPI 2021-10-08 /pmc/articles/PMC8512382/ /pubmed/34641008 http://dx.doi.org/10.3390/s21196689 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Maalouf, Alaa
Jubran, Ibrahim
Tukan, Murad
Feldman, Dan
Coresets for the Average Case Error for Finite Query Sets
title Coresets for the Average Case Error for Finite Query Sets
title_full Coresets for the Average Case Error for Finite Query Sets
title_fullStr Coresets for the Average Case Error for Finite Query Sets
title_full_unstemmed Coresets for the Average Case Error for Finite Query Sets
title_short Coresets for the Average Case Error for Finite Query Sets
title_sort coresets for the average case error for finite query sets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8512382/
https://www.ncbi.nlm.nih.gov/pubmed/34641008
http://dx.doi.org/10.3390/s21196689
work_keys_str_mv AT maaloufalaa coresetsfortheaveragecaseerrorforfinitequerysets
AT jubranibrahim coresetsfortheaveragecaseerrorforfinitequerysets
AT tukanmurad coresetsfortheaveragecaseerrorforfinitequerysets
AT feldmandan coresetsfortheaveragecaseerrorforfinitequerysets