Cargando…

Clustering assessment in weighted networks

We provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test f...

Descripción completa

Detalles Bibliográficos
Autores principales: Arratia, Argimiro, Renedo Mirambell, Martí
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237321/
https://www.ncbi.nlm.nih.gov/pubmed/34239979
http://dx.doi.org/10.7717/peerj-cs.600
_version_ 1783714707709886464
author Arratia, Argimiro
Renedo Mirambell, Martí
author_facet Arratia, Argimiro
Renedo Mirambell, Martí
author_sort Arratia, Argimiro
collection PubMed
description We provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test for cluster significance, we introduce a set of community scoring functions adapted to weighted networks, and systematically compare their values to those of a suitable null model. For this we propose a switching model to produce randomized graphs with weighted edges while maintaining the degree distribution constant. To test for cluster stability, we introduce a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics. In order to assess the effectiveness of our clustering quality evaluation methods, we test them on synthetically generated weighted networks with a ground truth community structure of varying strength based on the stochastic block model construction. When applying the proposed methods to these synthetic ground truth networks’ clusters, as well as to other weighted networks with known community structure, these correctly identify the best performing algorithms, which suggests their adequacy for cases where the clustering structure is not known. We test our clustering validation methods on a varied collection of well known clustering algorithms applied to the synthetically generated networks and to several real world weighted networks. All our clustering validation methods are implemented in R, and will be released in the upcoming package clustAnalytics.
format Online
Article
Text
id pubmed-8237321
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-82373212021-07-07 Clustering assessment in weighted networks Arratia, Argimiro Renedo Mirambell, Martí PeerJ Comput Sci Algorithms and Analysis of Algorithms We provide a systematic approach to validate the results of clustering methods on weighted networks, in particular for the cases where the existence of a community structure is unknown. Our validation of clustering comprises a set of criteria for assessing their significance and stability. To test for cluster significance, we introduce a set of community scoring functions adapted to weighted networks, and systematically compare their values to those of a suitable null model. For this we propose a switching model to produce randomized graphs with weighted edges while maintaining the degree distribution constant. To test for cluster stability, we introduce a non parametric bootstrap method combined with similarity metrics derived from information theory and combinatorics. In order to assess the effectiveness of our clustering quality evaluation methods, we test them on synthetically generated weighted networks with a ground truth community structure of varying strength based on the stochastic block model construction. When applying the proposed methods to these synthetic ground truth networks’ clusters, as well as to other weighted networks with known community structure, these correctly identify the best performing algorithms, which suggests their adequacy for cases where the clustering structure is not known. We test our clustering validation methods on a varied collection of well known clustering algorithms applied to the synthetically generated networks and to several real world weighted networks. All our clustering validation methods are implemented in R, and will be released in the upcoming package clustAnalytics. PeerJ Inc. 2021-06-18 /pmc/articles/PMC8237321/ /pubmed/34239979 http://dx.doi.org/10.7717/peerj-cs.600 Text en ©2021 Arratia and Renedo Mirambell https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Algorithms and Analysis of Algorithms
Arratia, Argimiro
Renedo Mirambell, Martí
Clustering assessment in weighted networks
title Clustering assessment in weighted networks
title_full Clustering assessment in weighted networks
title_fullStr Clustering assessment in weighted networks
title_full_unstemmed Clustering assessment in weighted networks
title_short Clustering assessment in weighted networks
title_sort clustering assessment in weighted networks
topic Algorithms and Analysis of Algorithms
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237321/
https://www.ncbi.nlm.nih.gov/pubmed/34239979
http://dx.doi.org/10.7717/peerj-cs.600
work_keys_str_mv AT arratiaargimiro clusteringassessmentinweightednetworks
AT renedomirambellmarti clusteringassessmentinweightednetworks