Cargando…

DHPV: a distributed algorithm for large-scale graph partitioning

Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in hand...

Descripción completa

Detalles Bibliográficos
Autores principales: Adoni, Wilfried Yves Hamilton, Nahhal, Tarik, Krichen, Moez, El byed, Abdeltif, Assayad, Ismail
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493068/
https://www.ncbi.nlm.nih.gov/pubmed/32953386
http://dx.doi.org/10.1186/s40537-020-00357-y
_version_ 1783582492131852288
author Adoni, Wilfried Yves Hamilton
Nahhal, Tarik
Krichen, Moez
El byed, Abdeltif
Assayad, Ismail
author_facet Adoni, Wilfried Yves Hamilton
Nahhal, Tarik
Krichen, Moez
El byed, Abdeltif
Assayad, Ismail
author_sort Adoni, Wilfried Yves Hamilton
collection PubMed
description Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP.
format Online
Article
Text
id pubmed-7493068
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-74930682020-09-16 DHPV: a distributed algorithm for large-scale graph partitioning Adoni, Wilfried Yves Hamilton Nahhal, Tarik Krichen, Moez El byed, Abdeltif Assayad, Ismail J Big Data Methodology Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP. Springer International Publishing 2020-09-16 2020 /pmc/articles/PMC7493068/ /pubmed/32953386 http://dx.doi.org/10.1186/s40537-020-00357-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Methodology
Adoni, Wilfried Yves Hamilton
Nahhal, Tarik
Krichen, Moez
El byed, Abdeltif
Assayad, Ismail
DHPV: a distributed algorithm for large-scale graph partitioning
title DHPV: a distributed algorithm for large-scale graph partitioning
title_full DHPV: a distributed algorithm for large-scale graph partitioning
title_fullStr DHPV: a distributed algorithm for large-scale graph partitioning
title_full_unstemmed DHPV: a distributed algorithm for large-scale graph partitioning
title_short DHPV: a distributed algorithm for large-scale graph partitioning
title_sort dhpv: a distributed algorithm for large-scale graph partitioning
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493068/
https://www.ncbi.nlm.nih.gov/pubmed/32953386
http://dx.doi.org/10.1186/s40537-020-00357-y
work_keys_str_mv AT adoniwilfriedyveshamilton dhpvadistributedalgorithmforlargescalegraphpartitioning
AT nahhaltarik dhpvadistributedalgorithmforlargescalegraphpartitioning
AT krichenmoez dhpvadistributedalgorithmforlargescalegraphpartitioning
AT elbyedabdeltif dhpvadistributedalgorithmforlargescalegraphpartitioning
AT assayadismail dhpvadistributedalgorithmforlargescalegraphpartitioning