Cargando…

Constant community identification in million-scale networks

The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always cluster...

Descripción completa

Detalles Bibliográficos
Autores principales: Chowdhury, Anjan, Srinivasan, Sriram, Bhowmick, Sanjukta, Mukherjee, Animesh, Ghosh, Kuntal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Vienna 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243870/
https://www.ncbi.nlm.nih.gov/pubmed/35789889
http://dx.doi.org/10.1007/s13278-022-00895-8
_version_ 1784738406253723648
author Chowdhury, Anjan
Srinivasan, Sriram
Bhowmick, Sanjukta
Mukherjee, Animesh
Ghosh, Kuntal
author_facet Chowdhury, Anjan
Srinivasan, Sriram
Bhowmick, Sanjukta
Mukherjee, Animesh
Ghosh, Kuntal
author_sort Chowdhury, Anjan
collection PubMed
description The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable.
format Online
Article
Text
id pubmed-9243870
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Vienna
record_format MEDLINE/PubMed
spelling pubmed-92438702022-06-30 Constant community identification in million-scale networks Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal Soc Netw Anal Min Original Article The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable. Springer Vienna 2022-06-30 2022 /pmc/articles/PMC9243870/ /pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Chowdhury, Anjan
Srinivasan, Sriram
Bhowmick, Sanjukta
Mukherjee, Animesh
Ghosh, Kuntal
Constant community identification in million-scale networks
title Constant community identification in million-scale networks
title_full Constant community identification in million-scale networks
title_fullStr Constant community identification in million-scale networks
title_full_unstemmed Constant community identification in million-scale networks
title_short Constant community identification in million-scale networks
title_sort constant community identification in million-scale networks
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243870/
https://www.ncbi.nlm.nih.gov/pubmed/35789889
http://dx.doi.org/10.1007/s13278-022-00895-8
work_keys_str_mv AT chowdhuryanjan constantcommunityidentificationinmillionscalenetworks
AT srinivasansriram constantcommunityidentificationinmillionscalenetworks
AT bhowmicksanjukta constantcommunityidentificationinmillionscalenetworks
AT mukherjeeanimesh constantcommunityidentificationinmillionscalenetworks
AT ghoshkuntal constantcommunityidentificationinmillionscalenetworks