Cargando…

An Improved Density Peak Clustering Algorithm for Multi-Density Data

Density peak clustering is the latest classic density-based clustering algorithm, which can directly find the cluster center without iteration. The algorithm needs to determine a unique parameter, so the selection of parameters is particularly important. However, for multi-density data, when one par...

Descripción completa

Detalles Bibliográficos
Autores principales: Yin, Lifeng, Wang, Yingfeng, Chen, Huayue, Deng, Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695166/
https://www.ncbi.nlm.nih.gov/pubmed/36433414
http://dx.doi.org/10.3390/s22228814
_version_ 1784837988144906240
author Yin, Lifeng
Wang, Yingfeng
Chen, Huayue
Deng, Wu
author_facet Yin, Lifeng
Wang, Yingfeng
Chen, Huayue
Deng, Wu
author_sort Yin, Lifeng
collection PubMed
description Density peak clustering is the latest classic density-based clustering algorithm, which can directly find the cluster center without iteration. The algorithm needs to determine a unique parameter, so the selection of parameters is particularly important. However, for multi-density data, when one parameter cannot satisfy all data, clustering often cannot achieve good results. Moreover, the subjective selection of cluster centers through decision diagrams is often not very convincing, and there are also certain errors. In view of the above problems, in order to achieve better clustering of multi-density data, this paper improves the density peak clustering algorithm. Aiming at the selection of parameter d(c), the K-nearest neighbor idea is used to sort the neighbor distance of each data, draw a line graph of the K-nearest neighbor distance, and find the global bifurcation point to divide the data with different densities. Aiming at the selection of cluster centers, the local density and distance of each data point in each data division is found, a γ map is drawn, the average value of the γ height difference is calculated, and through two screenings the largest discontinuity point is found to automatically determine the cluster center and the number of cluster centers. The divided datasets are clustered by the DPC algorithm, and then the clustering results are perfected and integrated by using the cluster fusion rules. Finally, a variety of experiments are designed from various perspectives on various artificial simulated datasets and UCI real datasets, which demonstrate the superiority of the F-DPC algorithm in terms of clustering effect, clustering quality, and number of samples.
format Online
Article
Text
id pubmed-9695166
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96951662022-11-26 An Improved Density Peak Clustering Algorithm for Multi-Density Data Yin, Lifeng Wang, Yingfeng Chen, Huayue Deng, Wu Sensors (Basel) Article Density peak clustering is the latest classic density-based clustering algorithm, which can directly find the cluster center without iteration. The algorithm needs to determine a unique parameter, so the selection of parameters is particularly important. However, for multi-density data, when one parameter cannot satisfy all data, clustering often cannot achieve good results. Moreover, the subjective selection of cluster centers through decision diagrams is often not very convincing, and there are also certain errors. In view of the above problems, in order to achieve better clustering of multi-density data, this paper improves the density peak clustering algorithm. Aiming at the selection of parameter d(c), the K-nearest neighbor idea is used to sort the neighbor distance of each data, draw a line graph of the K-nearest neighbor distance, and find the global bifurcation point to divide the data with different densities. Aiming at the selection of cluster centers, the local density and distance of each data point in each data division is found, a γ map is drawn, the average value of the γ height difference is calculated, and through two screenings the largest discontinuity point is found to automatically determine the cluster center and the number of cluster centers. The divided datasets are clustered by the DPC algorithm, and then the clustering results are perfected and integrated by using the cluster fusion rules. Finally, a variety of experiments are designed from various perspectives on various artificial simulated datasets and UCI real datasets, which demonstrate the superiority of the F-DPC algorithm in terms of clustering effect, clustering quality, and number of samples. MDPI 2022-11-15 /pmc/articles/PMC9695166/ /pubmed/36433414 http://dx.doi.org/10.3390/s22228814 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yin, Lifeng
Wang, Yingfeng
Chen, Huayue
Deng, Wu
An Improved Density Peak Clustering Algorithm for Multi-Density Data
title An Improved Density Peak Clustering Algorithm for Multi-Density Data
title_full An Improved Density Peak Clustering Algorithm for Multi-Density Data
title_fullStr An Improved Density Peak Clustering Algorithm for Multi-Density Data
title_full_unstemmed An Improved Density Peak Clustering Algorithm for Multi-Density Data
title_short An Improved Density Peak Clustering Algorithm for Multi-Density Data
title_sort improved density peak clustering algorithm for multi-density data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695166/
https://www.ncbi.nlm.nih.gov/pubmed/36433414
http://dx.doi.org/10.3390/s22228814
work_keys_str_mv AT yinlifeng animproveddensitypeakclusteringalgorithmformultidensitydata
AT wangyingfeng animproveddensitypeakclusteringalgorithmformultidensitydata
AT chenhuayue animproveddensitypeakclusteringalgorithmformultidensitydata
AT dengwu animproveddensitypeakclusteringalgorithmformultidensitydata
AT yinlifeng improveddensitypeakclusteringalgorithmformultidensitydata
AT wangyingfeng improveddensitypeakclusteringalgorithmformultidensitydata
AT chenhuayue improveddensitypeakclusteringalgorithmformultidensitydata
AT dengwu improveddensitypeakclusteringalgorithmformultidensitydata