Cargando…

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the in...

Descripción completa

Detalles Bibliográficos
Autores principales: Pérez-Ortega, Joaquín, Almanza-Ortega, Nelva Nely, Romero, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124732/
https://www.ncbi.nlm.nih.gov/pubmed/30183705
http://dx.doi.org/10.1371/journal.pone.0201874
_version_ 1783353067489460224
author Pérez-Ortega, Joaquín
Almanza-Ortega, Nelva Nely
Romero, David
author_facet Pérez-Ortega, Joaquín
Almanza-Ortega, Nelva Nely
Romero, David
author_sort Pérez-Ortega, Joaquín
collection PubMed
description In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.
format Online
Article
Text
id pubmed-6124732
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61247322018-09-16 Balancing effort and benefit of K-means clustering algorithms in Big Data realms Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David PLoS One Research Article In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms. Public Library of Science 2018-09-05 /pmc/articles/PMC6124732/ /pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874 Text en © 2018 Pérez-Ortega et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pérez-Ortega, Joaquín
Almanza-Ortega, Nelva Nely
Romero, David
Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_full Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_fullStr Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_full_unstemmed Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_short Balancing effort and benefit of K-means clustering algorithms in Big Data realms
title_sort balancing effort and benefit of k-means clustering algorithms in big data realms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124732/
https://www.ncbi.nlm.nih.gov/pubmed/30183705
http://dx.doi.org/10.1371/journal.pone.0201874
work_keys_str_mv AT perezortegajoaquin balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms
AT almanzaorteganelvanely balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms
AT romerodavid balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms