Cargando…

An efficient framework for obtaining the initial cluster centers

Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge...

Descripción completa

Detalles Bibliográficos
Autores principales: Mishra, B. K., Mohanty, Sachi Nandan, Baidyanath, R. R., Ali, Shahid, Abduvalieva, D., Awwad, Fuad A., Ismail, Emad A. A., Gupta, Manish
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682192/
https://www.ncbi.nlm.nih.gov/pubmed/38012340
http://dx.doi.org/10.1038/s41598-023-48220-3
_version_ 1785150926828339200
author Mishra, B. K.
Mohanty, Sachi Nandan
Baidyanath, R. R.
Ali, Shahid
Abduvalieva, D.
Awwad, Fuad A.
Ismail, Emad A. A.
Gupta, Manish
author_facet Mishra, B. K.
Mohanty, Sachi Nandan
Baidyanath, R. R.
Ali, Shahid
Abduvalieva, D.
Awwad, Fuad A.
Ismail, Emad A. A.
Gupta, Manish
author_sort Mishra, B. K.
collection PubMed
description Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge to get the centers in no time for speeding up their model. However, by doing this they sacrifice the true essence of subgroup formation and in numerous occasions ends up in achieving malicious clustering. Due to this reason we were inclined towards suggesting a qualitative approach for obtaining the initial cluster centers and also focused on attaining the well-separated clusters. Our initial contributions were an alteration to the classical K-Means algorithm in an attempt to obtain the near-optimal cluster centers. Few fresh approaches were earlier suggested by us namely, far efficient K-means (FEKM), modified center K-means (MCKM) and modified FEKM using Quickhull (MFQ) which resulted in producing the factual centers leading to excellent clusters formation. K-means, which randomly selects the centers, seem to meet its convergence slightly earlier than these methods, which is the latter’s only weakness. An incessant study was continued in this regard to minimize the computational efficiency of our methods and we came up with farthest leap center selection (FLCS). All these methods were thoroughly analyzed by considering the clustering effectiveness, correctness, homogeneity, completeness, complexity and their actual execution time of convergence. For this reason performance indices like Dunn’s Index, Davies–Bouldin’s Index, and silhouette coefficient were used, for correctness Rand measure was used, for homogeneity and completeness V-measure was used. Experimental results on versatile real world datasets, taken from UCI repository, suggested that both FEKM and FLCS obtain well-separated centers while the later converges earlier.
format Online
Article
Text
id pubmed-10682192
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106821922023-11-30 An efficient framework for obtaining the initial cluster centers Mishra, B. K. Mohanty, Sachi Nandan Baidyanath, R. R. Ali, Shahid Abduvalieva, D. Awwad, Fuad A. Ismail, Emad A. A. Gupta, Manish Sci Rep Article Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge to get the centers in no time for speeding up their model. However, by doing this they sacrifice the true essence of subgroup formation and in numerous occasions ends up in achieving malicious clustering. Due to this reason we were inclined towards suggesting a qualitative approach for obtaining the initial cluster centers and also focused on attaining the well-separated clusters. Our initial contributions were an alteration to the classical K-Means algorithm in an attempt to obtain the near-optimal cluster centers. Few fresh approaches were earlier suggested by us namely, far efficient K-means (FEKM), modified center K-means (MCKM) and modified FEKM using Quickhull (MFQ) which resulted in producing the factual centers leading to excellent clusters formation. K-means, which randomly selects the centers, seem to meet its convergence slightly earlier than these methods, which is the latter’s only weakness. An incessant study was continued in this regard to minimize the computational efficiency of our methods and we came up with farthest leap center selection (FLCS). All these methods were thoroughly analyzed by considering the clustering effectiveness, correctness, homogeneity, completeness, complexity and their actual execution time of convergence. For this reason performance indices like Dunn’s Index, Davies–Bouldin’s Index, and silhouette coefficient were used, for correctness Rand measure was used, for homogeneity and completeness V-measure was used. Experimental results on versatile real world datasets, taken from UCI repository, suggested that both FEKM and FLCS obtain well-separated centers while the later converges earlier. Nature Publishing Group UK 2023-11-27 /pmc/articles/PMC10682192/ /pubmed/38012340 http://dx.doi.org/10.1038/s41598-023-48220-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Mishra, B. K.
Mohanty, Sachi Nandan
Baidyanath, R. R.
Ali, Shahid
Abduvalieva, D.
Awwad, Fuad A.
Ismail, Emad A. A.
Gupta, Manish
An efficient framework for obtaining the initial cluster centers
title An efficient framework for obtaining the initial cluster centers
title_full An efficient framework for obtaining the initial cluster centers
title_fullStr An efficient framework for obtaining the initial cluster centers
title_full_unstemmed An efficient framework for obtaining the initial cluster centers
title_short An efficient framework for obtaining the initial cluster centers
title_sort efficient framework for obtaining the initial cluster centers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682192/
https://www.ncbi.nlm.nih.gov/pubmed/38012340
http://dx.doi.org/10.1038/s41598-023-48220-3
work_keys_str_mv AT mishrabk anefficientframeworkforobtainingtheinitialclustercenters
AT mohantysachinandan anefficientframeworkforobtainingtheinitialclustercenters
AT baidyanathrr anefficientframeworkforobtainingtheinitialclustercenters
AT alishahid anefficientframeworkforobtainingtheinitialclustercenters
AT abduvalievad anefficientframeworkforobtainingtheinitialclustercenters
AT awwadfuada anefficientframeworkforobtainingtheinitialclustercenters
AT ismailemadaa anefficientframeworkforobtainingtheinitialclustercenters
AT guptamanish anefficientframeworkforobtainingtheinitialclustercenters
AT mishrabk efficientframeworkforobtainingtheinitialclustercenters
AT mohantysachinandan efficientframeworkforobtainingtheinitialclustercenters
AT baidyanathrr efficientframeworkforobtainingtheinitialclustercenters
AT alishahid efficientframeworkforobtainingtheinitialclustercenters
AT abduvalievad efficientframeworkforobtainingtheinitialclustercenters
AT awwadfuada efficientframeworkforobtainingtheinitialclustercenters
AT ismailemadaa efficientframeworkforobtainingtheinitialclustercenters
AT guptamanish efficientframeworkforobtainingtheinitialclustercenters