Cargando…
An efficient framework for obtaining the initial cluster centers
Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682192/ https://www.ncbi.nlm.nih.gov/pubmed/38012340 http://dx.doi.org/10.1038/s41598-023-48220-3 |
_version_ | 1785150926828339200 |
---|---|
author | Mishra, B. K. Mohanty, Sachi Nandan Baidyanath, R. R. Ali, Shahid Abduvalieva, D. Awwad, Fuad A. Ismail, Emad A. A. Gupta, Manish |
author_facet | Mishra, B. K. Mohanty, Sachi Nandan Baidyanath, R. R. Ali, Shahid Abduvalieva, D. Awwad, Fuad A. Ismail, Emad A. A. Gupta, Manish |
author_sort | Mishra, B. K. |
collection | PubMed |
description | Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge to get the centers in no time for speeding up their model. However, by doing this they sacrifice the true essence of subgroup formation and in numerous occasions ends up in achieving malicious clustering. Due to this reason we were inclined towards suggesting a qualitative approach for obtaining the initial cluster centers and also focused on attaining the well-separated clusters. Our initial contributions were an alteration to the classical K-Means algorithm in an attempt to obtain the near-optimal cluster centers. Few fresh approaches were earlier suggested by us namely, far efficient K-means (FEKM), modified center K-means (MCKM) and modified FEKM using Quickhull (MFQ) which resulted in producing the factual centers leading to excellent clusters formation. K-means, which randomly selects the centers, seem to meet its convergence slightly earlier than these methods, which is the latter’s only weakness. An incessant study was continued in this regard to minimize the computational efficiency of our methods and we came up with farthest leap center selection (FLCS). All these methods were thoroughly analyzed by considering the clustering effectiveness, correctness, homogeneity, completeness, complexity and their actual execution time of convergence. For this reason performance indices like Dunn’s Index, Davies–Bouldin’s Index, and silhouette coefficient were used, for correctness Rand measure was used, for homogeneity and completeness V-measure was used. Experimental results on versatile real world datasets, taken from UCI repository, suggested that both FEKM and FLCS obtain well-separated centers while the later converges earlier. |
format | Online Article Text |
id | pubmed-10682192 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-106821922023-11-30 An efficient framework for obtaining the initial cluster centers Mishra, B. K. Mohanty, Sachi Nandan Baidyanath, R. R. Ali, Shahid Abduvalieva, D. Awwad, Fuad A. Ismail, Emad A. A. Gupta, Manish Sci Rep Article Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge to get the centers in no time for speeding up their model. However, by doing this they sacrifice the true essence of subgroup formation and in numerous occasions ends up in achieving malicious clustering. Due to this reason we were inclined towards suggesting a qualitative approach for obtaining the initial cluster centers and also focused on attaining the well-separated clusters. Our initial contributions were an alteration to the classical K-Means algorithm in an attempt to obtain the near-optimal cluster centers. Few fresh approaches were earlier suggested by us namely, far efficient K-means (FEKM), modified center K-means (MCKM) and modified FEKM using Quickhull (MFQ) which resulted in producing the factual centers leading to excellent clusters formation. K-means, which randomly selects the centers, seem to meet its convergence slightly earlier than these methods, which is the latter’s only weakness. An incessant study was continued in this regard to minimize the computational efficiency of our methods and we came up with farthest leap center selection (FLCS). All these methods were thoroughly analyzed by considering the clustering effectiveness, correctness, homogeneity, completeness, complexity and their actual execution time of convergence. For this reason performance indices like Dunn’s Index, Davies–Bouldin’s Index, and silhouette coefficient were used, for correctness Rand measure was used, for homogeneity and completeness V-measure was used. Experimental results on versatile real world datasets, taken from UCI repository, suggested that both FEKM and FLCS obtain well-separated centers while the later converges earlier. Nature Publishing Group UK 2023-11-27 /pmc/articles/PMC10682192/ /pubmed/38012340 http://dx.doi.org/10.1038/s41598-023-48220-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Mishra, B. K. Mohanty, Sachi Nandan Baidyanath, R. R. Ali, Shahid Abduvalieva, D. Awwad, Fuad A. Ismail, Emad A. A. Gupta, Manish An efficient framework for obtaining the initial cluster centers |
title | An efficient framework for obtaining the initial cluster centers |
title_full | An efficient framework for obtaining the initial cluster centers |
title_fullStr | An efficient framework for obtaining the initial cluster centers |
title_full_unstemmed | An efficient framework for obtaining the initial cluster centers |
title_short | An efficient framework for obtaining the initial cluster centers |
title_sort | efficient framework for obtaining the initial cluster centers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682192/ https://www.ncbi.nlm.nih.gov/pubmed/38012340 http://dx.doi.org/10.1038/s41598-023-48220-3 |
work_keys_str_mv | AT mishrabk anefficientframeworkforobtainingtheinitialclustercenters AT mohantysachinandan anefficientframeworkforobtainingtheinitialclustercenters AT baidyanathrr anefficientframeworkforobtainingtheinitialclustercenters AT alishahid anefficientframeworkforobtainingtheinitialclustercenters AT abduvalievad anefficientframeworkforobtainingtheinitialclustercenters AT awwadfuada anefficientframeworkforobtainingtheinitialclustercenters AT ismailemadaa anefficientframeworkforobtainingtheinitialclustercenters AT guptamanish anefficientframeworkforobtainingtheinitialclustercenters AT mishrabk efficientframeworkforobtainingtheinitialclustercenters AT mohantysachinandan efficientframeworkforobtainingtheinitialclustercenters AT baidyanathrr efficientframeworkforobtainingtheinitialclustercenters AT alishahid efficientframeworkforobtainingtheinitialclustercenters AT abduvalievad efficientframeworkforobtainingtheinitialclustercenters AT awwadfuada efficientframeworkforobtainingtheinitialclustercenters AT ismailemadaa efficientframeworkforobtainingtheinitialclustercenters AT guptamanish efficientframeworkforobtainingtheinitialclustercenters |