Cargando…

Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a widely used algorithm for exploratory clustering applications. Despite the DBSCAN algorithm being considered an unsupervised pattern recognition method, it has two parameters that must be tuned prior to the clustering process...

Descripción completa

Detalles Bibliográficos
Autores principales: Perafan-Lopez, Juan Carlos, Ferrer-Gregory, Valeria Lucía, Nieto-Londoño, César, Sierra-Pérez, Julián
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322930/
https://www.ncbi.nlm.nih.gov/pubmed/35885099
http://dx.doi.org/10.3390/e24070875
_version_ 1784756425245851648
author Perafan-Lopez, Juan Carlos
Ferrer-Gregory, Valeria Lucía
Nieto-Londoño, César
Sierra-Pérez, Julián
author_facet Perafan-Lopez, Juan Carlos
Ferrer-Gregory, Valeria Lucía
Nieto-Londoño, César
Sierra-Pérez, Julián
author_sort Perafan-Lopez, Juan Carlos
collection PubMed
description Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a widely used algorithm for exploratory clustering applications. Despite the DBSCAN algorithm being considered an unsupervised pattern recognition method, it has two parameters that must be tuned prior to the clustering process in order to reduce uncertainties, the minimum number of points in a clustering segmentation MinPts, and the radii around selected points from a specific dataset Eps. This article presents the performance of a clustering hybrid algorithm for automatically grouping datasets into a two-dimensional space using the well-known algorithm DBSCAN. Here, the function nearest neighbor and a genetic algorithm were used for the automation of parameters MinPts and Eps. Furthermore, the Factor Analysis (FA) method was defined for pre-processing through a dimensionality reduction of high-dimensional datasets with dimensions greater than two. Finally, the performance of the clustering algorithm called FA+GA-DBSCAN was evaluated using artificial datasets. In addition, the precision and Entropy of the clustering hybrid algorithm were measured, which showed there was less probability of error in clustering the most condensed datasets.
format Online
Article
Text
id pubmed-9322930
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93229302022-07-27 Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets Perafan-Lopez, Juan Carlos Ferrer-Gregory, Valeria Lucía Nieto-Londoño, César Sierra-Pérez, Julián Entropy (Basel) Article Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a widely used algorithm for exploratory clustering applications. Despite the DBSCAN algorithm being considered an unsupervised pattern recognition method, it has two parameters that must be tuned prior to the clustering process in order to reduce uncertainties, the minimum number of points in a clustering segmentation MinPts, and the radii around selected points from a specific dataset Eps. This article presents the performance of a clustering hybrid algorithm for automatically grouping datasets into a two-dimensional space using the well-known algorithm DBSCAN. Here, the function nearest neighbor and a genetic algorithm were used for the automation of parameters MinPts and Eps. Furthermore, the Factor Analysis (FA) method was defined for pre-processing through a dimensionality reduction of high-dimensional datasets with dimensions greater than two. Finally, the performance of the clustering algorithm called FA+GA-DBSCAN was evaluated using artificial datasets. In addition, the precision and Entropy of the clustering hybrid algorithm were measured, which showed there was less probability of error in clustering the most condensed datasets. MDPI 2022-06-25 /pmc/articles/PMC9322930/ /pubmed/35885099 http://dx.doi.org/10.3390/e24070875 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Perafan-Lopez, Juan Carlos
Ferrer-Gregory, Valeria Lucía
Nieto-Londoño, César
Sierra-Pérez, Julián
Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets
title Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets
title_full Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets
title_fullStr Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets
title_full_unstemmed Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets
title_short Performance Analysis and Architecture of a Clustering Hybrid Algorithm Called FA+GA-DBSCAN Using Artificial Datasets
title_sort performance analysis and architecture of a clustering hybrid algorithm called fa+ga-dbscan using artificial datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322930/
https://www.ncbi.nlm.nih.gov/pubmed/35885099
http://dx.doi.org/10.3390/e24070875
work_keys_str_mv AT perafanlopezjuancarlos performanceanalysisandarchitectureofaclusteringhybridalgorithmcalledfagadbscanusingartificialdatasets
AT ferrergregoryvalerialucia performanceanalysisandarchitectureofaclusteringhybridalgorithmcalledfagadbscanusingartificialdatasets
AT nietolondonocesar performanceanalysisandarchitectureofaclusteringhybridalgorithmcalledfagadbscanusingartificialdatasets
AT sierraperezjulian performanceanalysisandarchitectureofaclusteringhybridalgorithmcalledfagadbscanusingartificialdatasets