Cargando…

A distributed query execution engine of big attributed graphs

A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Batarfi, Omar, Elshawi, Radwa, Fayoumi, Ayman, Barnawi, Ahmed, Sakr, Sherif
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4899405/ https://www.ncbi.nlm.nih.gov/pubmed/27350905 http://dx.doi.org/10.1186/s40064-016-2251-0

_version_	1782436453805981696
author	Batarfi, Omar Elshawi, Radwa Fayoumi, Ayman Barnawi, Ahmed Sakr, Sherif
author_facet	Batarfi, Omar Elshawi, Radwa Fayoumi, Ayman Barnawi, Ahmed Sakr, Sherif
author_sort	Batarfi, Omar
collection	PubMed
description	A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
format	Online Article Text
id	pubmed-4899405
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-48994052016-06-27 A distributed query execution engine of big attributed graphs Batarfi, Omar Elshawi, Radwa Fayoumi, Ayman Barnawi, Ahmed Sakr, Sherif Springerplus Research A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes. Springer International Publishing 2016-05-23 /pmc/articles/PMC4899405/ /pubmed/27350905 http://dx.doi.org/10.1186/s40064-016-2251-0 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Research Batarfi, Omar Elshawi, Radwa Fayoumi, Ayman Barnawi, Ahmed Sakr, Sherif A distributed query execution engine of big attributed graphs
title	A distributed query execution engine of big attributed graphs
title_full	A distributed query execution engine of big attributed graphs
title_fullStr	A distributed query execution engine of big attributed graphs
title_full_unstemmed	A distributed query execution engine of big attributed graphs
title_short	A distributed query execution engine of big attributed graphs
title_sort	distributed query execution engine of big attributed graphs
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4899405/ https://www.ncbi.nlm.nih.gov/pubmed/27350905 http://dx.doi.org/10.1186/s40064-016-2251-0
work_keys_str_mv	AT batarfiomar adistributedqueryexecutionengineofbigattributedgraphs AT elshawiradwa adistributedqueryexecutionengineofbigattributedgraphs AT fayoumiayman adistributedqueryexecutionengineofbigattributedgraphs AT barnawiahmed adistributedqueryexecutionengineofbigattributedgraphs AT sakrsherif adistributedqueryexecutionengineofbigattributedgraphs AT batarfiomar distributedqueryexecutionengineofbigattributedgraphs AT elshawiradwa distributedqueryexecutionengineofbigattributedgraphs AT fayoumiayman distributedqueryexecutionengineofbigattributedgraphs AT barnawiahmed distributedqueryexecutionengineofbigattributedgraphs AT sakrsherif distributedqueryexecutionengineofbigattributedgraphs

A distributed query execution engine of big attributed graphs

Ejemplares similares