Cargando…

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pérez-Ortega, Joaquín, Almanza-Ortega, Nelva Nely, Romero, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124732/ https://www.ncbi.nlm.nih.gov/pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874

_version_	1783353067489460224
author	Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David
author_facet	Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David
author_sort	Pérez-Ortega, Joaquín
collection	PubMed
description	In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.
format	Online Article Text
id	pubmed-6124732
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-61247322018-09-16 Balancing effort and benefit of K-means clustering algorithms in Big Data realms Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David PLoS One Research Article In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms. Public Library of Science 2018-09-05 /pmc/articles/PMC6124732/ /pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874 Text en © 2018 Pérez-Ortega et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title	Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_full	Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_fullStr	Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_full_unstemmed	Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_short	Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_sort	balancing effort and benefit of k-means clustering algorithms in big data realms
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124732/ https://www.ncbi.nlm.nih.gov/pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874
work_keys_str_mv	AT perezortegajoaquin balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms AT almanzaorteganelvanely balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms AT romerodavid balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Ejemplares similares