Cargando…

Constant community identification in million-scale networks

The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always cluster...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chowdhury, Anjan, Srinivasan, Sriram, Bhowmick, Sanjukta, Mukherjee, Animesh, Ghosh, Kuntal
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Vienna 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243870/ https://www.ncbi.nlm.nih.gov/pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8

_version_	1784738406253723648
author	Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal
author_facet	Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal
author_sort	Chowdhury, Anjan
collection	PubMed
description	The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable.
format	Online Article Text
id	pubmed-9243870
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer Vienna
record_format	MEDLINE/PubMed
spelling	pubmed-92438702022-06-30 Constant community identification in million-scale networks Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal Soc Netw Anal Min Original Article The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable. Springer Vienna 2022-06-30 2022 /pmc/articles/PMC9243870/ /pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Original Article Chowdhury, Anjan Srinivasan, Sriram Bhowmick, Sanjukta Mukherjee, Animesh Ghosh, Kuntal Constant community identification in million-scale networks
title	Constant community identification in million-scale networks
title_full	Constant community identification in million-scale networks
title_fullStr	Constant community identification in million-scale networks
title_full_unstemmed	Constant community identification in million-scale networks
title_short	Constant community identification in million-scale networks
title_sort	constant community identification in million-scale networks
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243870/ https://www.ncbi.nlm.nih.gov/pubmed/35789889 http://dx.doi.org/10.1007/s13278-022-00895-8
work_keys_str_mv	AT chowdhuryanjan constantcommunityidentificationinmillionscalenetworks AT srinivasansriram constantcommunityidentificationinmillionscalenetworks AT bhowmicksanjukta constantcommunityidentificationinmillionscalenetworks AT mukherjeeanimesh constantcommunityidentificationinmillionscalenetworks AT ghoshkuntal constantcommunityidentificationinmillionscalenetworks

Constant community identification in million-scale networks

Ejemplares similares