Cargando…
DHPV: a distributed algorithm for large-scale graph partitioning
Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in hand...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493068/ https://www.ncbi.nlm.nih.gov/pubmed/32953386 http://dx.doi.org/10.1186/s40537-020-00357-y |
_version_ | 1783582492131852288 |
---|---|
author | Adoni, Wilfried Yves Hamilton Nahhal, Tarik Krichen, Moez El byed, Abdeltif Assayad, Ismail |
author_facet | Adoni, Wilfried Yves Hamilton Nahhal, Tarik Krichen, Moez El byed, Abdeltif Assayad, Ismail |
author_sort | Adoni, Wilfried Yves Hamilton |
collection | PubMed |
description | Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP. |
format | Online Article Text |
id | pubmed-7493068 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-74930682020-09-16 DHPV: a distributed algorithm for large-scale graph partitioning Adoni, Wilfried Yves Hamilton Nahhal, Tarik Krichen, Moez El byed, Abdeltif Assayad, Ismail J Big Data Methodology Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP. Springer International Publishing 2020-09-16 2020 /pmc/articles/PMC7493068/ /pubmed/32953386 http://dx.doi.org/10.1186/s40537-020-00357-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Methodology Adoni, Wilfried Yves Hamilton Nahhal, Tarik Krichen, Moez El byed, Abdeltif Assayad, Ismail DHPV: a distributed algorithm for large-scale graph partitioning |
title | DHPV: a distributed algorithm for large-scale graph partitioning |
title_full | DHPV: a distributed algorithm for large-scale graph partitioning |
title_fullStr | DHPV: a distributed algorithm for large-scale graph partitioning |
title_full_unstemmed | DHPV: a distributed algorithm for large-scale graph partitioning |
title_short | DHPV: a distributed algorithm for large-scale graph partitioning |
title_sort | dhpv: a distributed algorithm for large-scale graph partitioning |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493068/ https://www.ncbi.nlm.nih.gov/pubmed/32953386 http://dx.doi.org/10.1186/s40537-020-00357-y |
work_keys_str_mv | AT adoniwilfriedyveshamilton dhpvadistributedalgorithmforlargescalegraphpartitioning AT nahhaltarik dhpvadistributedalgorithmforlargescalegraphpartitioning AT krichenmoez dhpvadistributedalgorithmforlargescalegraphpartitioning AT elbyedabdeltif dhpvadistributedalgorithmforlargescalegraphpartitioning AT assayadismail dhpvadistributedalgorithmforlargescalegraphpartitioning |