Cargando…

Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study

Dimensionality reduction is widely used in machine learning and big data analytics since it helps to analyze and to visualize large, high-dimensional datasets. In particular, it can considerably help to perform tasks like data clustering and classification. Recently, embedding methods have emerged a...

Descripción completa

Detalles Bibliográficos
Autores principales: Allaoui, Mebarka, Kherfi, Mohammed Lamine, Cheriet, Abdelhakim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340901/
http://dx.doi.org/10.1007/978-3-030-51935-3_34
_version_ 1783555117753040896
author Allaoui, Mebarka
Kherfi, Mohammed Lamine
Cheriet, Abdelhakim
author_facet Allaoui, Mebarka
Kherfi, Mohammed Lamine
Cheriet, Abdelhakim
author_sort Allaoui, Mebarka
collection PubMed
description Dimensionality reduction is widely used in machine learning and big data analytics since it helps to analyze and to visualize large, high-dimensional datasets. In particular, it can considerably help to perform tasks like data clustering and classification. Recently, embedding methods have emerged as a promising direction for improving clustering accuracy. They can preserve the local structure and simultaneously reveal the global structure of data, thereby reasonably improving clustering performance. In this paper, we investigate how to improve the performance of several clustering algorithms using one of the most successful embedding techniques: Uniform Manifold Approximation and Projection or UMAP. This technique has recently been proposed as a manifold learning technique for dimensionality reduction. It is based on Riemannian geometry and algebraic topology. Our main hypothesis is that UMAP would permit to find the best clusterable embedding manifold, and therefore, we applied it as a preprocessing step before performing clustering. We compare the results of many well-known clustering algorithms such ask-means, HDBSCAN, GMM and Agglomerative Hierarchical Clustering when they operate on the low-dimension feature space yielded by UMAP. A series of experiments on several image datasets demonstrate that the proposed method allows each of the clustering algorithms studied to improve its performance on each dataset considered. Based on Accuracy measure, the improvement can reach a remarkable rate of 60%.
format Online
Article
Text
id pubmed-7340901
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73409012020-07-08 Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study Allaoui, Mebarka Kherfi, Mohammed Lamine Cheriet, Abdelhakim Image and Signal Processing Article Dimensionality reduction is widely used in machine learning and big data analytics since it helps to analyze and to visualize large, high-dimensional datasets. In particular, it can considerably help to perform tasks like data clustering and classification. Recently, embedding methods have emerged as a promising direction for improving clustering accuracy. They can preserve the local structure and simultaneously reveal the global structure of data, thereby reasonably improving clustering performance. In this paper, we investigate how to improve the performance of several clustering algorithms using one of the most successful embedding techniques: Uniform Manifold Approximation and Projection or UMAP. This technique has recently been proposed as a manifold learning technique for dimensionality reduction. It is based on Riemannian geometry and algebraic topology. Our main hypothesis is that UMAP would permit to find the best clusterable embedding manifold, and therefore, we applied it as a preprocessing step before performing clustering. We compare the results of many well-known clustering algorithms such ask-means, HDBSCAN, GMM and Agglomerative Hierarchical Clustering when they operate on the low-dimension feature space yielded by UMAP. A series of experiments on several image datasets demonstrate that the proposed method allows each of the clustering algorithms studied to improve its performance on each dataset considered. Based on Accuracy measure, the improvement can reach a remarkable rate of 60%. 2020-06-05 /pmc/articles/PMC7340901/ http://dx.doi.org/10.1007/978-3-030-51935-3_34 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Allaoui, Mebarka
Kherfi, Mohammed Lamine
Cheriet, Abdelhakim
Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study
title Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study
title_full Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study
title_fullStr Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study
title_full_unstemmed Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study
title_short Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study
title_sort considerably improving clustering algorithms using umap dimensionality reduction technique: a comparative study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340901/
http://dx.doi.org/10.1007/978-3-030-51935-3_34
work_keys_str_mv AT allaouimebarka considerablyimprovingclusteringalgorithmsusingumapdimensionalityreductiontechniqueacomparativestudy
AT kherfimohammedlamine considerablyimprovingclusteringalgorithmsusingumapdimensionalityreductiontechniqueacomparativestudy
AT cherietabdelhakim considerablyimprovingclusteringalgorithmsusingumapdimensionalityreductiontechniqueacomparativestudy