Cargando…
Efficient similarity-based data clustering by optimal object to cluster reallocation
We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instea...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5983489/ https://www.ncbi.nlm.nih.gov/pubmed/29856755 http://dx.doi.org/10.1371/journal.pone.0197450 |
_version_ | 1783328431606333440 |
---|---|
author | Rossignol, Mathias Lagrange, Mathieu Cont, Arshia |
author_facet | Rossignol, Mathias Lagrange, Mathieu Cont, Arshia |
author_sort | Rossignol, Mathias |
collection | PubMed |
description | We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach permits the relaxing of some conditions on usable affinity matrices like semi-positiveness, as well as opening possibilities for computational optimization required for large datasets. Systematic evaluation on a variety of data sets shows that compared with kernel k-means and the spectral clustering methods, the proposed approach gives equivalent or better performance, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections. Material enabling the reproducibility of the results is made available online. |
format | Online Article Text |
id | pubmed-5983489 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-59834892018-06-17 Efficient similarity-based data clustering by optimal object to cluster reallocation Rossignol, Mathias Lagrange, Mathieu Cont, Arshia PLoS One Research Article We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach permits the relaxing of some conditions on usable affinity matrices like semi-positiveness, as well as opening possibilities for computational optimization required for large datasets. Systematic evaluation on a variety of data sets shows that compared with kernel k-means and the spectral clustering methods, the proposed approach gives equivalent or better performance, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections. Material enabling the reproducibility of the results is made available online. Public Library of Science 2018-06-01 /pmc/articles/PMC5983489/ /pubmed/29856755 http://dx.doi.org/10.1371/journal.pone.0197450 Text en © 2018 Rossignol et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Rossignol, Mathias Lagrange, Mathieu Cont, Arshia Efficient similarity-based data clustering by optimal object to cluster reallocation |
title | Efficient similarity-based data clustering by optimal object to cluster reallocation |
title_full | Efficient similarity-based data clustering by optimal object to cluster reallocation |
title_fullStr | Efficient similarity-based data clustering by optimal object to cluster reallocation |
title_full_unstemmed | Efficient similarity-based data clustering by optimal object to cluster reallocation |
title_short | Efficient similarity-based data clustering by optimal object to cluster reallocation |
title_sort | efficient similarity-based data clustering by optimal object to cluster reallocation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5983489/ https://www.ncbi.nlm.nih.gov/pubmed/29856755 http://dx.doi.org/10.1371/journal.pone.0197450 |
work_keys_str_mv | AT rossignolmathias efficientsimilaritybaseddataclusteringbyoptimalobjecttoclusterreallocation AT lagrangemathieu efficientsimilaritybaseddataclusteringbyoptimalobjecttoclusterreallocation AT contarshia efficientsimilaritybaseddataclusteringbyoptimalobjecttoclusterreallocation |