Cargando…

Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis

The hubness phenomenon is a recently discovered aspect of the curse of dimensionality. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. A closely related problem is the concentration of distances in high-dimensional...

Descripción completa

Detalles Bibliográficos
Autores principales: Flexer, Arthur, Schnitzer, Dominik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Science Publishers 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4567076/
https://www.ncbi.nlm.nih.gov/pubmed/26640321
http://dx.doi.org/10.1016/j.neucom.2014.11.084
_version_ 1782389770590093312
author Flexer, Arthur
Schnitzer, Dominik
author_facet Flexer, Arthur
Schnitzer, Dominik
author_sort Flexer, Arthur
collection PubMed
description The hubness phenomenon is a recently discovered aspect of the curse of dimensionality. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. A closely related problem is the concentration of distances in high-dimensional spaces. Previous work has already advocated the use of fractional ℓ(p) norms instead of the ubiquitous Euclidean norm to avoid the negative effects of distance concentration. However, which exact fractional norm to use is a largely unsolved problem. The contribution of this work is an empirical analysis of the relation of different ℓ(p) norms and hubness. We propose an unsupervised approach for choosing an ℓ(p) norm which minimizes hubs while simultaneously maximizing nearest neighbor classification. Our approach is evaluated on seven high-dimensional data sets and compared to three approaches that re-scale distances to avoid hubness.
format Online
Article
Text
id pubmed-4567076
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Elsevier Science Publishers
record_format MEDLINE/PubMed
spelling pubmed-45670762015-12-02 Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis Flexer, Arthur Schnitzer, Dominik Neurocomputing Article The hubness phenomenon is a recently discovered aspect of the curse of dimensionality. Hub objects have a small distance to an exceptionally large number of data points while anti-hubs lie far from all other data points. A closely related problem is the concentration of distances in high-dimensional spaces. Previous work has already advocated the use of fractional ℓ(p) norms instead of the ubiquitous Euclidean norm to avoid the negative effects of distance concentration. However, which exact fractional norm to use is a largely unsolved problem. The contribution of this work is an empirical analysis of the relation of different ℓ(p) norms and hubness. We propose an unsupervised approach for choosing an ℓ(p) norm which minimizes hubs while simultaneously maximizing nearest neighbor classification. Our approach is evaluated on seven high-dimensional data sets and compared to three approaches that re-scale distances to avoid hubness. Elsevier Science Publishers 2015-12-02 /pmc/articles/PMC4567076/ /pubmed/26640321 http://dx.doi.org/10.1016/j.neucom.2014.11.084 Text en © 2015 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Flexer, Arthur
Schnitzer, Dominik
Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
title Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
title_full Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
title_fullStr Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
title_full_unstemmed Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
title_short Choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
title_sort choosing ℓ(p) norms in high-dimensional spaces based on hub analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4567076/
https://www.ncbi.nlm.nih.gov/pubmed/26640321
http://dx.doi.org/10.1016/j.neucom.2014.11.084
work_keys_str_mv AT flexerarthur choosinglpnormsinhighdimensionalspacesbasedonhubanalysis
AT schnitzerdominik choosinglpnormsinhighdimensionalspacesbasedonhubanalysis