Cargando…
Coresets for the Average Case Error for Finite Query Sets
Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural rela...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8512382/ https://www.ncbi.nlm.nih.gov/pubmed/34641008 http://dx.doi.org/10.3390/s21196689 |
_version_ | 1784582977582268416 |
---|---|
author | Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan |
author_facet | Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan |
author_sort | Maalouf, Alaa |
collection | PubMed |
description | Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for [Formula: see text] , where [Formula: see text] is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided. |
format | Online Article Text |
id | pubmed-8512382 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-85123822021-10-14 Coresets for the Average Case Error for Finite Query Sets Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan Sensors (Basel) Article Coreset is usually a small weighted subset of an input set of items, that provably approximates their loss function for a given set of queries (models, classifiers, hypothesis). That is, the maximum (worst-case) error over all queries is bounded. To obtain smaller coresets, we suggest a natural relaxation: coresets whose average error over the given set of queries is bounded. We provide both deterministic and randomized (generic) algorithms for computing such a coreset for any finite set of queries. Unlike most corresponding coresets for the worst-case error, the size of the coreset in this work is independent of both the input size and its Vapnik–Chervonenkis (VC) dimension. The main technique is to reduce the average-case coreset into the vector summarization problem, where the goal is to compute a weighted subset of the n input vectors which approximates their sum. We then suggest the first algorithm for computing this weighted subset in time that is linear in the input size, for [Formula: see text] , where [Formula: see text] is the approximation error, improving, e.g., both [ICML’17] and applications for principal component analysis (PCA) [NIPS’16]. Experimental results show significant and consistent improvement also in practice. Open source code is provided. MDPI 2021-10-08 /pmc/articles/PMC8512382/ /pubmed/34641008 http://dx.doi.org/10.3390/s21196689 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Maalouf, Alaa Jubran, Ibrahim Tukan, Murad Feldman, Dan Coresets for the Average Case Error for Finite Query Sets |
title | Coresets for the Average Case Error for Finite Query Sets |
title_full | Coresets for the Average Case Error for Finite Query Sets |
title_fullStr | Coresets for the Average Case Error for Finite Query Sets |
title_full_unstemmed | Coresets for the Average Case Error for Finite Query Sets |
title_short | Coresets for the Average Case Error for Finite Query Sets |
title_sort | coresets for the average case error for finite query sets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8512382/ https://www.ncbi.nlm.nih.gov/pubmed/34641008 http://dx.doi.org/10.3390/s21196689 |
work_keys_str_mv | AT maaloufalaa coresetsfortheaveragecaseerrorforfinitequerysets AT jubranibrahim coresetsfortheaveragecaseerrorforfinitequerysets AT tukanmurad coresetsfortheaveragecaseerrorforfinitequerysets AT feldmandan coresetsfortheaveragecaseerrorforfinitequerysets |