Cargando…

Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Le, Guedj, Benjamin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8622390/
https://www.ncbi.nlm.nih.gov/pubmed/34828234
http://dx.doi.org/10.3390/e23111534
_version_ 1784605682359599104
author Li, Le
Guedj, Benjamin
author_facet Li, Le
Guedj, Benjamin
author_sort Li, Le
collection PubMed
description When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data.
format Online
Article
Text
id pubmed-8622390
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86223902021-11-27 Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly Li, Le Guedj, Benjamin Entropy (Basel) Article When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. A principal curve acts as a nonlinear generalization of PCA, and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation (called slpc, for sequential learning principal curves) that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret computation and performance on synthetic and real-life data. MDPI 2021-11-18 /pmc/articles/PMC8622390/ /pubmed/34828234 http://dx.doi.org/10.3390/e23111534 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Le
Guedj, Benjamin
Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
title Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
title_full Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
title_fullStr Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
title_full_unstemmed Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
title_short Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly
title_sort sequential learning of principal curves: summarizing data streams on the fly
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8622390/
https://www.ncbi.nlm.nih.gov/pubmed/34828234
http://dx.doi.org/10.3390/e23111534
work_keys_str_mv AT lile sequentiallearningofprincipalcurvessummarizingdatastreamsonthefly
AT guedjbenjamin sequentiallearningofprincipalcurvessummarizingdatastreamsonthefly