Cargando…
An Improved Density Peak Clustering Algorithm for Multi-Density Data
Density peak clustering is the latest classic density-based clustering algorithm, which can directly find the cluster center without iteration. The algorithm needs to determine a unique parameter, so the selection of parameters is particularly important. However, for multi-density data, when one par...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695166/ https://www.ncbi.nlm.nih.gov/pubmed/36433414 http://dx.doi.org/10.3390/s22228814 |
_version_ | 1784837988144906240 |
---|---|
author | Yin, Lifeng Wang, Yingfeng Chen, Huayue Deng, Wu |
author_facet | Yin, Lifeng Wang, Yingfeng Chen, Huayue Deng, Wu |
author_sort | Yin, Lifeng |
collection | PubMed |
description | Density peak clustering is the latest classic density-based clustering algorithm, which can directly find the cluster center without iteration. The algorithm needs to determine a unique parameter, so the selection of parameters is particularly important. However, for multi-density data, when one parameter cannot satisfy all data, clustering often cannot achieve good results. Moreover, the subjective selection of cluster centers through decision diagrams is often not very convincing, and there are also certain errors. In view of the above problems, in order to achieve better clustering of multi-density data, this paper improves the density peak clustering algorithm. Aiming at the selection of parameter d(c), the K-nearest neighbor idea is used to sort the neighbor distance of each data, draw a line graph of the K-nearest neighbor distance, and find the global bifurcation point to divide the data with different densities. Aiming at the selection of cluster centers, the local density and distance of each data point in each data division is found, a γ map is drawn, the average value of the γ height difference is calculated, and through two screenings the largest discontinuity point is found to automatically determine the cluster center and the number of cluster centers. The divided datasets are clustered by the DPC algorithm, and then the clustering results are perfected and integrated by using the cluster fusion rules. Finally, a variety of experiments are designed from various perspectives on various artificial simulated datasets and UCI real datasets, which demonstrate the superiority of the F-DPC algorithm in terms of clustering effect, clustering quality, and number of samples. |
format | Online Article Text |
id | pubmed-9695166 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96951662022-11-26 An Improved Density Peak Clustering Algorithm for Multi-Density Data Yin, Lifeng Wang, Yingfeng Chen, Huayue Deng, Wu Sensors (Basel) Article Density peak clustering is the latest classic density-based clustering algorithm, which can directly find the cluster center without iteration. The algorithm needs to determine a unique parameter, so the selection of parameters is particularly important. However, for multi-density data, when one parameter cannot satisfy all data, clustering often cannot achieve good results. Moreover, the subjective selection of cluster centers through decision diagrams is often not very convincing, and there are also certain errors. In view of the above problems, in order to achieve better clustering of multi-density data, this paper improves the density peak clustering algorithm. Aiming at the selection of parameter d(c), the K-nearest neighbor idea is used to sort the neighbor distance of each data, draw a line graph of the K-nearest neighbor distance, and find the global bifurcation point to divide the data with different densities. Aiming at the selection of cluster centers, the local density and distance of each data point in each data division is found, a γ map is drawn, the average value of the γ height difference is calculated, and through two screenings the largest discontinuity point is found to automatically determine the cluster center and the number of cluster centers. The divided datasets are clustered by the DPC algorithm, and then the clustering results are perfected and integrated by using the cluster fusion rules. Finally, a variety of experiments are designed from various perspectives on various artificial simulated datasets and UCI real datasets, which demonstrate the superiority of the F-DPC algorithm in terms of clustering effect, clustering quality, and number of samples. MDPI 2022-11-15 /pmc/articles/PMC9695166/ /pubmed/36433414 http://dx.doi.org/10.3390/s22228814 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Yin, Lifeng Wang, Yingfeng Chen, Huayue Deng, Wu An Improved Density Peak Clustering Algorithm for Multi-Density Data |
title | An Improved Density Peak Clustering Algorithm for Multi-Density Data |
title_full | An Improved Density Peak Clustering Algorithm for Multi-Density Data |
title_fullStr | An Improved Density Peak Clustering Algorithm for Multi-Density Data |
title_full_unstemmed | An Improved Density Peak Clustering Algorithm for Multi-Density Data |
title_short | An Improved Density Peak Clustering Algorithm for Multi-Density Data |
title_sort | improved density peak clustering algorithm for multi-density data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9695166/ https://www.ncbi.nlm.nih.gov/pubmed/36433414 http://dx.doi.org/10.3390/s22228814 |
work_keys_str_mv | AT yinlifeng animproveddensitypeakclusteringalgorithmformultidensitydata AT wangyingfeng animproveddensitypeakclusteringalgorithmformultidensitydata AT chenhuayue animproveddensitypeakclusteringalgorithmformultidensitydata AT dengwu animproveddensitypeakclusteringalgorithmformultidensitydata AT yinlifeng improveddensitypeakclusteringalgorithmformultidensitydata AT wangyingfeng improveddensitypeakclusteringalgorithmformultidensitydata AT chenhuayue improveddensitypeakclusteringalgorithmformultidensitydata AT dengwu improveddensitypeakclusteringalgorithmformultidensitydata |