Cargando…

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Hubness is an aspect of the curse of dimensionality related to the distance concentration effect. Hubs occur in high-dimensional data spaces as objects that are particularly often among the nearest neighbors of other objects. Conversely, other data objects become antihubs, which are rarely or never...

Descripción completa

Detalles Bibliográficos
Autores principales:	Feldbauer, Roman, Flexer, Arthur
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2018
Materias:	Regular Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327987/ https://www.ncbi.nlm.nih.gov/pubmed/32647403 http://dx.doi.org/10.1007/s10115-018-1205-y

_version_	1783552666441351168
author	Feldbauer, Roman Flexer, Arthur
author_facet	Feldbauer, Roman Flexer, Arthur
author_sort	Feldbauer, Roman
collection	PubMed
description	Hubness is an aspect of the curse of dimensionality related to the distance concentration effect. Hubs occur in high-dimensional data spaces as objects that are particularly often among the nearest neighbors of other objects. Conversely, other data objects become antihubs, which are rarely or never nearest neighbors to other objects. Many machine learning algorithms rely on nearest neighbor search and some form of measuring distances, which are both impaired by high hubness. Degraded performance due to hubness has been reported for various tasks such as classification, clustering, regression, visualization, recommendation, retrieval and outlier detection. Several hubness reduction methods based on different paradigms have previously been developed. Local and global scaling as well as shared neighbors approaches aim at repairing asymmetric neighborhood relations. Global and localized centering try to eliminate spatial centrality, while the related global and local dissimilarity measures are based on density gradient flattening. Additional methods and alternative dissimilarity measures that were argued to mitigate detrimental effects of distance concentration also influence the related hubness phenomenon. In this paper, we present a large-scale empirical evaluation of all available unsupervised hubness reduction methods and dissimilarity measures. We investigate several aspects of hubness reduction as well as its influence on data semantics which we measure via nearest neighbor classification. Scaling and density gradient flattening methods improve evaluation measures such as hubness and classification accuracy consistently for data sets from a wide range of domains, while centering approaches achieve the same only under specific settings.
format	Online Article Text
id	pubmed-7327987
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-73279872020-07-07 A comprehensive empirical comparison of hubness reduction in high-dimensional spaces Feldbauer, Roman Flexer, Arthur Knowl Inf Syst Regular Paper Hubness is an aspect of the curse of dimensionality related to the distance concentration effect. Hubs occur in high-dimensional data spaces as objects that are particularly often among the nearest neighbors of other objects. Conversely, other data objects become antihubs, which are rarely or never nearest neighbors to other objects. Many machine learning algorithms rely on nearest neighbor search and some form of measuring distances, which are both impaired by high hubness. Degraded performance due to hubness has been reported for various tasks such as classification, clustering, regression, visualization, recommendation, retrieval and outlier detection. Several hubness reduction methods based on different paradigms have previously been developed. Local and global scaling as well as shared neighbors approaches aim at repairing asymmetric neighborhood relations. Global and localized centering try to eliminate spatial centrality, while the related global and local dissimilarity measures are based on density gradient flattening. Additional methods and alternative dissimilarity measures that were argued to mitigate detrimental effects of distance concentration also influence the related hubness phenomenon. In this paper, we present a large-scale empirical evaluation of all available unsupervised hubness reduction methods and dissimilarity measures. We investigate several aspects of hubness reduction as well as its influence on data semantics which we measure via nearest neighbor classification. Scaling and density gradient flattening methods improve evaluation measures such as hubness and classification accuracy consistently for data sets from a wide range of domains, while centering approaches achieve the same only under specific settings. Springer London 2018-05-18 2019 /pmc/articles/PMC7327987/ /pubmed/32647403 http://dx.doi.org/10.1007/s10115-018-1205-y Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Regular Paper Feldbauer, Roman Flexer, Arthur A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
title	A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
title_full	A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
title_fullStr	A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
title_full_unstemmed	A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
title_short	A comprehensive empirical comparison of hubness reduction in high-dimensional spaces
title_sort	comprehensive empirical comparison of hubness reduction in high-dimensional spaces
topic	Regular Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327987/ https://www.ncbi.nlm.nih.gov/pubmed/32647403 http://dx.doi.org/10.1007/s10115-018-1205-y
work_keys_str_mv	AT feldbauerroman acomprehensiveempiricalcomparisonofhubnessreductioninhighdimensionalspaces AT flexerarthur acomprehensiveempiricalcomparisonofhubnessreductioninhighdimensionalspaces AT feldbauerroman comprehensiveempiricalcomparisonofhubnessreductioninhighdimensionalspaces AT flexerarthur comprehensiveempiricalcomparisonofhubnessreductioninhighdimensionalspaces

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Ejemplares similares