Cargando…
(p,q)-biclique counting and enumeration for large sparse bipartite graphs
In this paper, we study the problem of ([Formula: see text] , [Formula: see text] )-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph [Formula: see text] and two integer parameters p and q, we aim to efficiently count and enumerate all ([Formula: see text]...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008723/ https://www.ncbi.nlm.nih.gov/pubmed/37362202 http://dx.doi.org/10.1007/s00778-023-00786-0 |
_version_ | 1784905821941923840 |
---|---|
author | Yang, Jianye Peng, Yun Ouyang, Dian Zhang, Wenjie Lin, Xuemin Zhao, Xiang |
author_facet | Yang, Jianye Peng, Yun Ouyang, Dian Zhang, Wenjie Lin, Xuemin Zhao, Xiang |
author_sort | Yang, Jianye |
collection | PubMed |
description | In this paper, we study the problem of ([Formula: see text] , [Formula: see text] )-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph [Formula: see text] and two integer parameters p and q, we aim to efficiently count and enumerate all ([Formula: see text] , [Formula: see text] )-bicliques in G, where a ([Formula: see text] , [Formula: see text] )-biclique B(L, R) is a complete subgraph of G with [Formula: see text] , [Formula: see text] , [Formula: see text] , and [Formula: see text] . The problem of ([Formula: see text] , [Formula: see text] )-biclique counting and enumeration has many applications, such as graph neural network information aggregation, densest subgraph detection, and cohesive subgroup analysis. Despite the wide range of applications, to the best of our knowledge, we note that there is no efficient and scalable solution to this problem in the literature . This problem is computationally challenging, due to the worst-case exponential number of ([Formula: see text] , [Formula: see text] )-bicliques. In this paper, we propose a competitive branch-and-bound baseline method, namely BCList, which explores the search space in a depth-first manner, together with a variety of pruning techniques. Although BCList offers a useful computation framework to our problem, its worst-case time complexity is exponential to [Formula: see text] . To alleviate this, we propose an advanced approach, called BCList++. Particularly, BCList++ applies a layer-based exploring strategy to enumerate ([Formula: see text] , [Formula: see text] )-bicliques by anchoring the search on either U or V only, which has a worst-case time complexity exponential to either p or q only. Consequently, a vital task is to choose a layer with the least computation cost. To this end, we develop a cost model, which is built upon an unbiased estimator for the density of 2-hop graph induced by U or V. To improve computation efficiency, BCList++ exploits pre-allocated arrays and vertex labeling techniques such that the frequent subgraph creating operations can be substituted by array element switching operations. We conduct extensive experiments on 16 real-life datasets, and the experimental results demonstrate that BCList++ significantly outperforms the baseline methods by up to 3 orders of magnitude. We show via a case study that ([Formula: see text] , [Formula: see text] )-bicliques optimizes the efficiency of graph neural networks. In this paper, we extend our techniques to count and enumerate ([Formula: see text] , [Formula: see text] )-bicliques on uncertain bipartite graphs. An efficient method IUBCList is developed on the top of BCList++, together with a couple of pruning techniques, including common neighbor refinement and search branch early termination, to discard unpromising uncertain ([Formula: see text] , [Formula: see text] )-bicliques early. The experimental results demonstrate that IUBCList significantly outperforms the baseline method by up to 2 orders of magnitude. |
format | Online Article Text |
id | pubmed-10008723 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-100087232023-03-13 (p,q)-biclique counting and enumeration for large sparse bipartite graphs Yang, Jianye Peng, Yun Ouyang, Dian Zhang, Wenjie Lin, Xuemin Zhao, Xiang VLDB J Regular Paper In this paper, we study the problem of ([Formula: see text] , [Formula: see text] )-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph [Formula: see text] and two integer parameters p and q, we aim to efficiently count and enumerate all ([Formula: see text] , [Formula: see text] )-bicliques in G, where a ([Formula: see text] , [Formula: see text] )-biclique B(L, R) is a complete subgraph of G with [Formula: see text] , [Formula: see text] , [Formula: see text] , and [Formula: see text] . The problem of ([Formula: see text] , [Formula: see text] )-biclique counting and enumeration has many applications, such as graph neural network information aggregation, densest subgraph detection, and cohesive subgroup analysis. Despite the wide range of applications, to the best of our knowledge, we note that there is no efficient and scalable solution to this problem in the literature . This problem is computationally challenging, due to the worst-case exponential number of ([Formula: see text] , [Formula: see text] )-bicliques. In this paper, we propose a competitive branch-and-bound baseline method, namely BCList, which explores the search space in a depth-first manner, together with a variety of pruning techniques. Although BCList offers a useful computation framework to our problem, its worst-case time complexity is exponential to [Formula: see text] . To alleviate this, we propose an advanced approach, called BCList++. Particularly, BCList++ applies a layer-based exploring strategy to enumerate ([Formula: see text] , [Formula: see text] )-bicliques by anchoring the search on either U or V only, which has a worst-case time complexity exponential to either p or q only. Consequently, a vital task is to choose a layer with the least computation cost. To this end, we develop a cost model, which is built upon an unbiased estimator for the density of 2-hop graph induced by U or V. To improve computation efficiency, BCList++ exploits pre-allocated arrays and vertex labeling techniques such that the frequent subgraph creating operations can be substituted by array element switching operations. We conduct extensive experiments on 16 real-life datasets, and the experimental results demonstrate that BCList++ significantly outperforms the baseline methods by up to 3 orders of magnitude. We show via a case study that ([Formula: see text] , [Formula: see text] )-bicliques optimizes the efficiency of graph neural networks. In this paper, we extend our techniques to count and enumerate ([Formula: see text] , [Formula: see text] )-bicliques on uncertain bipartite graphs. An efficient method IUBCList is developed on the top of BCList++, together with a couple of pruning techniques, including common neighbor refinement and search branch early termination, to discard unpromising uncertain ([Formula: see text] , [Formula: see text] )-bicliques early. The experimental results demonstrate that IUBCList significantly outperforms the baseline method by up to 2 orders of magnitude. Springer Berlin Heidelberg 2023-03-13 /pmc/articles/PMC10008723/ /pubmed/37362202 http://dx.doi.org/10.1007/s00778-023-00786-0 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Regular Paper Yang, Jianye Peng, Yun Ouyang, Dian Zhang, Wenjie Lin, Xuemin Zhao, Xiang (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
title | (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
title_full | (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
title_fullStr | (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
title_full_unstemmed | (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
title_short | (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
title_sort | (p,q)-biclique counting and enumeration for large sparse bipartite graphs |
topic | Regular Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008723/ https://www.ncbi.nlm.nih.gov/pubmed/37362202 http://dx.doi.org/10.1007/s00778-023-00786-0 |
work_keys_str_mv | AT yangjianye pqbicliquecountingandenumerationforlargesparsebipartitegraphs AT pengyun pqbicliquecountingandenumerationforlargesparsebipartitegraphs AT ouyangdian pqbicliquecountingandenumerationforlargesparsebipartitegraphs AT zhangwenjie pqbicliquecountingandenumerationforlargesparsebipartitegraphs AT linxuemin pqbicliquecountingandenumerationforlargesparsebipartitegraphs AT zhaoxiang pqbicliquecountingandenumerationforlargesparsebipartitegraphs |