Cargando…

How the Outliers Influence the Quality of Clustering?

In this article, we evaluate the efficiency and performance of two clustering algorithms: [Formula: see text] (Agglomerative Hierarchical Clustering) and [Formula: see text]. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess th...

Descripción completa

Detalles Bibliográficos
Autores principales: Nowak-Brzezińska, Agnieszka, Gaibei, Igor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324173/
https://www.ncbi.nlm.nih.gov/pubmed/35885141
http://dx.doi.org/10.3390/e24070917
_version_ 1784756742529220608
author Nowak-Brzezińska, Agnieszka
Gaibei, Igor
author_facet Nowak-Brzezińska, Agnieszka
Gaibei, Igor
author_sort Nowak-Brzezińska, Agnieszka
collection PubMed
description In this article, we evaluate the efficiency and performance of two clustering algorithms: [Formula: see text] (Agglomerative Hierarchical Clustering) and [Formula: see text]. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies–Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the [Formula: see text] (Local Outlier Factor) and [Formula: see text] (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing [Formula: see text] , [Formula: see text] , and [Formula: see text] of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances.
format Online
Article
Text
id pubmed-9324173
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93241732022-07-27 How the Outliers Influence the Quality of Clustering? Nowak-Brzezińska, Agnieszka Gaibei, Igor Entropy (Basel) Article In this article, we evaluate the efficiency and performance of two clustering algorithms: [Formula: see text] (Agglomerative Hierarchical Clustering) and [Formula: see text]. We are aware that there are various linkage options and distance measures that influence the clustering results. We assess the quality of clustering using the Davies–Bouldin and Dunn cluster validity indexes. The main contribution of this research is to verify whether the quality of clusters without outliers is higher than those with outliers in the data. To do this, we compare and analyze outlier detection algorithms depending on the applied clustering algorithm. In our research, we use and compare the [Formula: see text] (Local Outlier Factor) and [Formula: see text] (Connectivity-based Outlier Factor) algorithms for detecting outliers before and after removing [Formula: see text] , [Formula: see text] , and [Formula: see text] of outliers. Next, we analyze how the quality of clustering has improved. In the experiments, three real data sets were used with a different number of instances. MDPI 2022-06-30 /pmc/articles/PMC9324173/ /pubmed/35885141 http://dx.doi.org/10.3390/e24070917 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nowak-Brzezińska, Agnieszka
Gaibei, Igor
How the Outliers Influence the Quality of Clustering?
title How the Outliers Influence the Quality of Clustering?
title_full How the Outliers Influence the Quality of Clustering?
title_fullStr How the Outliers Influence the Quality of Clustering?
title_full_unstemmed How the Outliers Influence the Quality of Clustering?
title_short How the Outliers Influence the Quality of Clustering?
title_sort how the outliers influence the quality of clustering?
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324173/
https://www.ncbi.nlm.nih.gov/pubmed/35885141
http://dx.doi.org/10.3390/e24070917
work_keys_str_mv AT nowakbrzezinskaagnieszka howtheoutliersinfluencethequalityofclustering
AT gaibeiigor howtheoutliersinfluencethequalityofclustering