Cargando…
Unsupervised ranking of clustering algorithms by INFOMAX
Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algori...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7588117/ https://www.ncbi.nlm.nih.gov/pubmed/33104709 http://dx.doi.org/10.1371/journal.pone.0239331 |
_version_ | 1783600315815165952 |
---|---|
author | Sikdar, Sandipan Mukherjee, Animesh Marsili, Matteo |
author_facet | Sikdar, Sandipan Mukherjee, Animesh Marsili, Matteo |
author_sort | Sikdar, Sandipan |
collection | PubMed |
description | Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker’s Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms. |
format | Online Article Text |
id | pubmed-7588117 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-75881172020-10-30 Unsupervised ranking of clustering algorithms by INFOMAX Sikdar, Sandipan Mukherjee, Animesh Marsili, Matteo PLoS One Research Article Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker’s Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms. Public Library of Science 2020-10-26 /pmc/articles/PMC7588117/ /pubmed/33104709 http://dx.doi.org/10.1371/journal.pone.0239331 Text en © 2020 Sikdar et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Sikdar, Sandipan Mukherjee, Animesh Marsili, Matteo Unsupervised ranking of clustering algorithms by INFOMAX |
title | Unsupervised ranking of clustering algorithms by INFOMAX |
title_full | Unsupervised ranking of clustering algorithms by INFOMAX |
title_fullStr | Unsupervised ranking of clustering algorithms by INFOMAX |
title_full_unstemmed | Unsupervised ranking of clustering algorithms by INFOMAX |
title_short | Unsupervised ranking of clustering algorithms by INFOMAX |
title_sort | unsupervised ranking of clustering algorithms by infomax |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7588117/ https://www.ncbi.nlm.nih.gov/pubmed/33104709 http://dx.doi.org/10.1371/journal.pone.0239331 |
work_keys_str_mv | AT sikdarsandipan unsupervisedrankingofclusteringalgorithmsbyinfomax AT mukherjeeanimesh unsupervisedrankingofclusteringalgorithmsbyinfomax AT marsilimatteo unsupervisedrankingofclusteringalgorithmsbyinfomax |