Cargando…
Constant community identification in million-scale networks
The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always cluster...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Vienna
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243870/ https://www.ncbi.nlm.nih.gov/pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8 |
_version_ | 1784738406253723648 |
---|---|
author | Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal |
author_facet | Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal |
author_sort | Chowdhury, Anjan |
collection | PubMed |
description | The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable. |
format | Online Article Text |
id | pubmed-9243870 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Vienna |
record_format | MEDLINE/PubMed |
spelling | pubmed-92438702022-06-30 Constant community identification in million-scale networks Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal Soc Netw Anal Min Original Article The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable. Springer Vienna 2022-06-30 2022 /pmc/articles/PMC9243870/ /pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Original Article Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal Constant community identification in million-scale networks |
title | Constant community identification in million-scale networks |
title_full | Constant community identification in million-scale networks |
title_fullStr | Constant community identification in million-scale networks |
title_full_unstemmed | Constant community identification in million-scale networks |
title_short | Constant community identification in million-scale networks |
title_sort | constant community identification in million-scale networks |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243870/ https://www.ncbi.nlm.nih.gov/pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8 |
work_keys_str_mv | AT chowdhuryanjan constantcommunityidentificationinmillionscalenetworks AT srinivasansriram constantcommunityidentificationinmillionscalenetworks AT bhowmicksanjukta constantcommunityidentificationinmillionscalenetworks AT mukherjeeanimesh constantcommunityidentificationinmillionscalenetworks AT ghoshkuntal constantcommunityidentificationinmillionscalenetworks |