Cargando…

Scalable probabilistic truss decomposition using central limit theorem and H-index

Truss decomposition is a popular notion of hierarchical dense substructures in graphs. In a nutshell, k-truss is the largest subgraph in which every edge is contained in at least k triangles. Truss decomposition aims to compute k-trusses for each possible value of k. There are many works that study...

Descripción completa

Detalles Bibliográficos
Autores principales:	Esfahani, Fatemeh, Daneshmand, Mahsa, Srinivasan, Venkatesh, Thomo, Alex, Wu, Kui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310023/ https://www.ncbi.nlm.nih.gov/pubmed/35911177 http://dx.doi.org/10.1007/s10619-022-07415-9

_version_	1784753301243297792
author	Esfahani, Fatemeh Daneshmand, Mahsa Srinivasan, Venkatesh Thomo, Alex Wu, Kui
author_facet	Esfahani, Fatemeh Daneshmand, Mahsa Srinivasan, Venkatesh Thomo, Alex Wu, Kui
author_sort	Esfahani, Fatemeh
collection	PubMed
description	Truss decomposition is a popular notion of hierarchical dense substructures in graphs. In a nutshell, k-truss is the largest subgraph in which every edge is contained in at least k triangles. Truss decomposition aims to compute k-trusses for each possible value of k. There are many works that study truss decomposition in deterministic graphs. However, in probabilistic graphs, truss decomposition is significantly more challenging and has received much less attention; state-of-the-art approaches do not scale well to large probabilistic graphs. Finding the tail probabilities of the number of triangles that contain each edge is a critical challenge of those approaches. This is achieved using dynamic programming which has quadratic run-time and thus not scalable to real large networks which, quite commonly, can have edges contained in many triangles (in the millions). To address this challenge, we employ a special version of the Central Limit Theorem (CLT) to obtain the tail probabilities efficiently. Based on our CLT approach we propose a peeling algorithm for truss decomposition that scales to large probabilistic graphs and offers significant improvement over state-of-the-art. We also design a second method which progressively tightens the estimate of the truss value of each edge and is based on h-index computation. In contrast to our CLT-based approach, our h-index algorithm (1) is progressive by allowing the user to see near-results along the way, (2) does not sacrifice the exactness of final result, and (3) achieves all these while processing only one edge and its immediate neighbors at a time, thus resulting in smaller memory footprint. We perform extensive experiments to show the scalability of both of our proposed algorithms.
format	Online Article Text
id	pubmed-9310023
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-93100232022-07-25 Scalable probabilistic truss decomposition using central limit theorem and H-index Esfahani, Fatemeh Daneshmand, Mahsa Srinivasan, Venkatesh Thomo, Alex Wu, Kui Distrib Parallel Databases Article Truss decomposition is a popular notion of hierarchical dense substructures in graphs. In a nutshell, k-truss is the largest subgraph in which every edge is contained in at least k triangles. Truss decomposition aims to compute k-trusses for each possible value of k. There are many works that study truss decomposition in deterministic graphs. However, in probabilistic graphs, truss decomposition is significantly more challenging and has received much less attention; state-of-the-art approaches do not scale well to large probabilistic graphs. Finding the tail probabilities of the number of triangles that contain each edge is a critical challenge of those approaches. This is achieved using dynamic programming which has quadratic run-time and thus not scalable to real large networks which, quite commonly, can have edges contained in many triangles (in the millions). To address this challenge, we employ a special version of the Central Limit Theorem (CLT) to obtain the tail probabilities efficiently. Based on our CLT approach we propose a peeling algorithm for truss decomposition that scales to large probabilistic graphs and offers significant improvement over state-of-the-art. We also design a second method which progressively tightens the estimate of the truss value of each edge and is based on h-index computation. In contrast to our CLT-based approach, our h-index algorithm (1) is progressive by allowing the user to see near-results along the way, (2) does not sacrifice the exactness of final result, and (3) achieves all these while processing only one edge and its immediate neighbors at a time, thus resulting in smaller memory footprint. We perform extensive experiments to show the scalability of both of our proposed algorithms. Springer US 2022-07-25 2022 /pmc/articles/PMC9310023/ /pubmed/35911177 http://dx.doi.org/10.1007/s10619-022-07415-9 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Esfahani, Fatemeh Daneshmand, Mahsa Srinivasan, Venkatesh Thomo, Alex Wu, Kui Scalable probabilistic truss decomposition using central limit theorem and H-index
title	Scalable probabilistic truss decomposition using central limit theorem and H-index
title_full	Scalable probabilistic truss decomposition using central limit theorem and H-index
title_fullStr	Scalable probabilistic truss decomposition using central limit theorem and H-index
title_full_unstemmed	Scalable probabilistic truss decomposition using central limit theorem and H-index
title_short	Scalable probabilistic truss decomposition using central limit theorem and H-index
title_sort	scalable probabilistic truss decomposition using central limit theorem and h-index
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310023/ https://www.ncbi.nlm.nih.gov/pubmed/35911177 http://dx.doi.org/10.1007/s10619-022-07415-9
work_keys_str_mv	AT esfahanifatemeh scalableprobabilistictrussdecompositionusingcentrallimittheoremandhindex AT daneshmandmahsa scalableprobabilistictrussdecompositionusingcentrallimittheoremandhindex AT srinivasanvenkatesh scalableprobabilistictrussdecompositionusingcentrallimittheoremandhindex AT thomoalex scalableprobabilistictrussdecompositionusingcentrallimittheoremandhindex AT wukui scalableprobabilistictrussdecompositionusingcentrallimittheoremandhindex

Scalable probabilistic truss decomposition using central limit theorem and H-index

Ejemplares similares