Cargando…

Efficient similarity-based data clustering by optimal object to cluster reallocation

We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instea...

Descripción completa

Detalles Bibliográficos
Autores principales: Rossignol, Mathias, Lagrange, Mathieu, Cont, Arshia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5983489/
https://www.ncbi.nlm.nih.gov/pubmed/29856755
http://dx.doi.org/10.1371/journal.pone.0197450
_version_ 1783328431606333440
author Rossignol, Mathias
Lagrange, Mathieu
Cont, Arshia
author_facet Rossignol, Mathias
Lagrange, Mathieu
Cont, Arshia
author_sort Rossignol, Mathias
collection PubMed
description We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach permits the relaxing of some conditions on usable affinity matrices like semi-positiveness, as well as opening possibilities for computational optimization required for large datasets. Systematic evaluation on a variety of data sets shows that compared with kernel k-means and the spectral clustering methods, the proposed approach gives equivalent or better performance, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections. Material enabling the reproducibility of the results is made available online.
format Online
Article
Text
id pubmed-5983489
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59834892018-06-17 Efficient similarity-based data clustering by optimal object to cluster reallocation Rossignol, Mathias Lagrange, Mathieu Cont, Arshia PLoS One Research Article We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach permits the relaxing of some conditions on usable affinity matrices like semi-positiveness, as well as opening possibilities for computational optimization required for large datasets. Systematic evaluation on a variety of data sets shows that compared with kernel k-means and the spectral clustering methods, the proposed approach gives equivalent or better performance, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections. Material enabling the reproducibility of the results is made available online. Public Library of Science 2018-06-01 /pmc/articles/PMC5983489/ /pubmed/29856755 http://dx.doi.org/10.1371/journal.pone.0197450 Text en © 2018 Rossignol et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rossignol, Mathias
Lagrange, Mathieu
Cont, Arshia
Efficient similarity-based data clustering by optimal object to cluster reallocation
title Efficient similarity-based data clustering by optimal object to cluster reallocation
title_full Efficient similarity-based data clustering by optimal object to cluster reallocation
title_fullStr Efficient similarity-based data clustering by optimal object to cluster reallocation
title_full_unstemmed Efficient similarity-based data clustering by optimal object to cluster reallocation
title_short Efficient similarity-based data clustering by optimal object to cluster reallocation
title_sort efficient similarity-based data clustering by optimal object to cluster reallocation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5983489/
https://www.ncbi.nlm.nih.gov/pubmed/29856755
http://dx.doi.org/10.1371/journal.pone.0197450
work_keys_str_mv AT rossignolmathias efficientsimilaritybaseddataclusteringbyoptimalobjecttoclusterreallocation
AT lagrangemathieu efficientsimilaritybaseddataclusteringbyoptimalobjecttoclusterreallocation
AT contarshia efficientsimilaritybaseddataclusteringbyoptimalobjecttoclusterreallocation