Cargando…
Balancing effort and benefit of K-means clustering algorithms in Big Data realms
In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the in...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124732/ https://www.ncbi.nlm.nih.gov/pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874 |
_version_ | 1783353067489460224 |
---|---|
author | Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David |
author_facet | Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David |
author_sort | Pérez-Ortega, Joaquín |
collection | PubMed |
description | In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms. |
format | Online Article Text |
id | pubmed-6124732 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-61247322018-09-16 Balancing effort and benefit of K-means clustering algorithms in Big Data realms Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David PLoS One Research Article In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms. Public Library of Science 2018-09-05 /pmc/articles/PMC6124732/ /pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874 Text en © 2018 Pérez-Ortega et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Pérez-Ortega, Joaquín Almanza-Ortega, Nelva Nely Romero, David Balancing effort and benefit of K-means clustering algorithms in Big Data realms |
title | Balancing effort and benefit of K-means clustering algorithms in Big Data realms |
title_full | Balancing effort and benefit of K-means clustering algorithms in Big Data realms |
title_fullStr | Balancing effort and benefit of K-means clustering algorithms in Big Data realms |
title_full_unstemmed | Balancing effort and benefit of K-means clustering algorithms in Big Data realms |
title_short | Balancing effort and benefit of K-means clustering algorithms in Big Data realms |
title_sort | balancing effort and benefit of k-means clustering algorithms in big data realms |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6124732/ https://www.ncbi.nlm.nih.gov/pubmed/30183705 http://dx.doi.org/10.1371/journal.pone.0201874 |
work_keys_str_mv | AT perezortegajoaquin balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms AT almanzaorteganelvanely balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms AT romerodavid balancingeffortandbenefitofkmeansclusteringalgorithmsinbigdatarealms |