Cargando…

Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study

Random selection of initial centroids (centers) for clusters is a fundamental defect in K-means clustering algorithm as the algorithm's performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in K-means c...

Descripción completa

Detalles Bibliográficos
Autores principales: Pourahmad, Saeedeh, Basirat, Atefeh, Rahimi, Amir, Doostfatemeh, Marziyeh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7416251/
https://www.ncbi.nlm.nih.gov/pubmed/32802153
http://dx.doi.org/10.1155/2020/7636857
_version_ 1783569289367781376
author Pourahmad, Saeedeh
Basirat, Atefeh
Rahimi, Amir
Doostfatemeh, Marziyeh
author_facet Pourahmad, Saeedeh
Basirat, Atefeh
Rahimi, Amir
Doostfatemeh, Marziyeh
author_sort Pourahmad, Saeedeh
collection PubMed
description Random selection of initial centroids (centers) for clusters is a fundamental defect in K-means clustering algorithm as the algorithm's performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in K-means clustering algorithm. As regards, there are no comparative studies comparing these methods in various aspects, the present paper compared three hybrid methods with K-means clustering algorithm using concepts of genetic algorithm, minimum spanning tree, and hierarchical clustering method. Although these three hybrid methods have received more attention in previous researches, fewer studies have compared their results. Hence, seven quantitative datasets with different characteristics in terms of sample size, number of features, and number of different classes are utilized in present study. Eleven indices of external and internal evaluating index were also considered for comparing the methods. Data indicated that the hybrid methods resulted in higher convergence rate in obtaining the final solution than the ordinary K-means method. Furthermore, the hybrid method with hierarchical clustering algorithm converges to the optimal solution with less iteration than the other two hybrid methods. However, hybrid methods with minimal spanning trees and genetic algorithms may not always or often be more effective than the ordinary K-means method. Therefore, despite the computational complexity, these three hybrid methods have not led to much improvement in the K-means method. However, a simulation study is required to compare the methods and complete the conclusion.
format Online
Article
Text
id pubmed-7416251
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-74162512020-08-14 Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study Pourahmad, Saeedeh Basirat, Atefeh Rahimi, Amir Doostfatemeh, Marziyeh Comput Math Methods Med Research Article Random selection of initial centroids (centers) for clusters is a fundamental defect in K-means clustering algorithm as the algorithm's performance depends on initial centroids and may end up in local optimizations. Various hybrid methods have been introduced to resolve this defect in K-means clustering algorithm. As regards, there are no comparative studies comparing these methods in various aspects, the present paper compared three hybrid methods with K-means clustering algorithm using concepts of genetic algorithm, minimum spanning tree, and hierarchical clustering method. Although these three hybrid methods have received more attention in previous researches, fewer studies have compared their results. Hence, seven quantitative datasets with different characteristics in terms of sample size, number of features, and number of different classes are utilized in present study. Eleven indices of external and internal evaluating index were also considered for comparing the methods. Data indicated that the hybrid methods resulted in higher convergence rate in obtaining the final solution than the ordinary K-means method. Furthermore, the hybrid method with hierarchical clustering algorithm converges to the optimal solution with less iteration than the other two hybrid methods. However, hybrid methods with minimal spanning trees and genetic algorithms may not always or often be more effective than the ordinary K-means method. Therefore, despite the computational complexity, these three hybrid methods have not led to much improvement in the K-means method. However, a simulation study is required to compare the methods and complete the conclusion. Hindawi 2020-08-01 /pmc/articles/PMC7416251/ /pubmed/32802153 http://dx.doi.org/10.1155/2020/7636857 Text en Copyright © 2020 Saeedeh Pourahmad et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pourahmad, Saeedeh
Basirat, Atefeh
Rahimi, Amir
Doostfatemeh, Marziyeh
Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study
title Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study
title_full Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study
title_fullStr Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study
title_full_unstemmed Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study
title_short Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study
title_sort does determination of initial cluster centroids improve the performance of k-means clustering algorithm? comparison of three hybrid methods by genetic algorithm, minimum spanning tree, and hierarchical clustering in an applied study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7416251/
https://www.ncbi.nlm.nih.gov/pubmed/32802153
http://dx.doi.org/10.1155/2020/7636857
work_keys_str_mv AT pourahmadsaeedeh doesdeterminationofinitialclustercentroidsimprovetheperformanceofkmeansclusteringalgorithmcomparisonofthreehybridmethodsbygeneticalgorithmminimumspanningtreeandhierarchicalclusteringinanappliedstudy
AT basiratatefeh doesdeterminationofinitialclustercentroidsimprovetheperformanceofkmeansclusteringalgorithmcomparisonofthreehybridmethodsbygeneticalgorithmminimumspanningtreeandhierarchicalclusteringinanappliedstudy
AT rahimiamir doesdeterminationofinitialclustercentroidsimprovetheperformanceofkmeansclusteringalgorithmcomparisonofthreehybridmethodsbygeneticalgorithmminimumspanningtreeandhierarchicalclusteringinanappliedstudy
AT doostfatemehmarziyeh doesdeterminationofinitialclustercentroidsimprovetheperformanceofkmeansclusteringalgorithmcomparisonofthreehybridmethodsbygeneticalgorithmminimumspanningtreeandhierarchicalclusteringinanappliedstudy