Cargando…

An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling

K-means algorithm is one of the well-known unsupervised machine learning algorithms. The algorithm typically finds out distinct non-overlapping clusters in which each point is assigned to a group. The minimum squared distance technique distributes each point to the nearest clusters or subgroups. One...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zubair, Md., Iqbal, MD. Asif, Shil, Avijeet, Chowdhury, M. J. M., Moni, Mohammad Ali, Sarker, Iqbal H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243813/ http://dx.doi.org/10.1007/s40745-022-00428-2

_version_	1784738396230385664
author	Zubair, Md. Iqbal, MD. Asif Shil, Avijeet Chowdhury, M. J. M. Moni, Mohammad Ali Sarker, Iqbal H.
author_facet	Zubair, Md. Iqbal, MD. Asif Shil, Avijeet Chowdhury, M. J. M. Moni, Mohammad Ali Sarker, Iqbal H.
author_sort	Zubair, Md.
collection	PubMed
description	K-means algorithm is one of the well-known unsupervised machine learning algorithms. The algorithm typically finds out distinct non-overlapping clusters in which each point is assigned to a group. The minimum squared distance technique distributes each point to the nearest clusters or subgroups. One of the K-means algorithm’s main concerns is to find out the initial optimal centroids of clusters. It is the most challenging task to determine the optimum position of the initial clusters’ centroids at the very first iteration. This paper proposes an approach to find the optimal initial centroids efficiently to reduce the number of iterations and execution time. To analyze the effectiveness of our proposed method, we have utilized different real-world datasets to conduct experiments. We have first analyzed COVID-19 and patient datasets to show our proposed method’s efficiency. A synthetic dataset of 10M instances with 8 dimensions is also used to estimate the performance of the proposed algorithm. Experimental results show that our proposed method outperforms traditional kmeans++ and random centroids initialization methods regarding the computation time and the number of iterations.
format	Online Article Text
id	pubmed-9243813
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-92438132022-06-30 An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling Zubair, Md. Iqbal, MD. Asif Shil, Avijeet Chowdhury, M. J. M. Moni, Mohammad Ali Sarker, Iqbal H. Ann. Data. Sci. Article K-means algorithm is one of the well-known unsupervised machine learning algorithms. The algorithm typically finds out distinct non-overlapping clusters in which each point is assigned to a group. The minimum squared distance technique distributes each point to the nearest clusters or subgroups. One of the K-means algorithm’s main concerns is to find out the initial optimal centroids of clusters. It is the most challenging task to determine the optimum position of the initial clusters’ centroids at the very first iteration. This paper proposes an approach to find the optimal initial centroids efficiently to reduce the number of iterations and execution time. To analyze the effectiveness of our proposed method, we have utilized different real-world datasets to conduct experiments. We have first analyzed COVID-19 and patient datasets to show our proposed method’s efficiency. A synthetic dataset of 10M instances with 8 dimensions is also used to estimate the performance of the proposed algorithm. Experimental results show that our proposed method outperforms traditional kmeans++ and random centroids initialization methods regarding the computation time and the number of iterations. Springer Berlin Heidelberg 2022-06-25 /pmc/articles/PMC9243813/ http://dx.doi.org/10.1007/s40745-022-00428-2 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Zubair, Md. Iqbal, MD. Asif Shil, Avijeet Chowdhury, M. J. M. Moni, Mohammad Ali Sarker, Iqbal H. An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
title	An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
title_full	An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
title_fullStr	An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
title_full_unstemmed	An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
title_short	An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling
title_sort	improved k-means clustering algorithm towards an efficient data-driven modeling
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243813/ http://dx.doi.org/10.1007/s40745-022-00428-2
work_keys_str_mv	AT zubairmd animprovedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT iqbalmdasif animprovedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT shilavijeet animprovedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT chowdhurymjm animprovedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT monimohammadali animprovedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT sarkeriqbalh animprovedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT zubairmd improvedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT iqbalmdasif improvedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT shilavijeet improvedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT chowdhurymjm improvedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT monimohammadali improvedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling AT sarkeriqbalh improvedkmeansclusteringalgorithmtowardsanefficientdatadrivenmodeling

An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling

Ejemplares similares