Cargando…

Unsupervised ranking of clustering algorithms by INFOMAX

Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algori...

Descripción completa

Detalles Bibliográficos
Autores principales: Sikdar, Sandipan, Mukherjee, Animesh, Marsili, Matteo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7588117/
https://www.ncbi.nlm.nih.gov/pubmed/33104709
http://dx.doi.org/10.1371/journal.pone.0239331
_version_ 1783600315815165952
author Sikdar, Sandipan
Mukherjee, Animesh
Marsili, Matteo
author_facet Sikdar, Sandipan
Mukherjee, Animesh
Marsili, Matteo
author_sort Sikdar, Sandipan
collection PubMed
description Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker’s Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms.
format Online
Article
Text
id pubmed-7588117
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75881172020-10-30 Unsupervised ranking of clustering algorithms by INFOMAX Sikdar, Sandipan Mukherjee, Animesh Marsili, Matteo PLoS One Research Article Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker’s Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms. Public Library of Science 2020-10-26 /pmc/articles/PMC7588117/ /pubmed/33104709 http://dx.doi.org/10.1371/journal.pone.0239331 Text en © 2020 Sikdar et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sikdar, Sandipan
Mukherjee, Animesh
Marsili, Matteo
Unsupervised ranking of clustering algorithms by INFOMAX
title Unsupervised ranking of clustering algorithms by INFOMAX
title_full Unsupervised ranking of clustering algorithms by INFOMAX
title_fullStr Unsupervised ranking of clustering algorithms by INFOMAX
title_full_unstemmed Unsupervised ranking of clustering algorithms by INFOMAX
title_short Unsupervised ranking of clustering algorithms by INFOMAX
title_sort unsupervised ranking of clustering algorithms by infomax
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7588117/
https://www.ncbi.nlm.nih.gov/pubmed/33104709
http://dx.doi.org/10.1371/journal.pone.0239331
work_keys_str_mv AT sikdarsandipan unsupervisedrankingofclusteringalgorithmsbyinfomax
AT mukherjeeanimesh unsupervisedrankingofclusteringalgorithmsbyinfomax
AT marsilimatteo unsupervisedrankingofclusteringalgorithmsbyinfomax