Cargando…

Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality

The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using the Manhattan distance and even fractional [Formula: see text] quasinorms (for p less than 1) can help to overcome the curse of dimensionality in classificat...

Descripción completa

Detalles Bibliográficos
Autores principales: Mirkes, Evgeny M., Allohibi, Jeza, Gorban, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597215/
https://www.ncbi.nlm.nih.gov/pubmed/33286874
http://dx.doi.org/10.3390/e22101105
Descripción
Sumario:The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using the Manhattan distance and even fractional [Formula: see text] quasinorms (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. It is illustrated that fractional quasinorms have a greater relative contrast and coefficient of variation than the Euclidean norm [Formula: see text] , but it is shown that this difference decays with increasing space dimension. It has been demonstrated that the concentration of distances shows qualitatively the same behaviour for all tested norms and quasinorms. It is shown that a greater relative contrast does not mean a better classification quality. It was revealed that for different databases the best (worst) performance was achieved under different norms (quasinorms). A systematic comparison shows that the difference in the performance of kNN classifiers for [Formula: see text] at p = 0.5, 1, and 2 is statistically insignificant. Analysis of curse and blessing of dimensionality requires careful definition of data dimensionality that rarely coincides with the number of attributes. We systematically examined several intrinsic dimensions of the data.