Cargando…

R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets

The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vu, Tin, Eldawy, Ahmed
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931855/ https://www.ncbi.nlm.nih.gov/pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028

_version_	1783660367883272192
author	Vu, Tin Eldawy, Ahmed
author_facet	Vu, Tin Eldawy, Ahmed
author_sort	Vu, Tin
collection	PubMed
description	The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R-Grove to outperform existing techniques in spatial query processing. R-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.
format	Online Article Text
id	pubmed-7931855
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-79318552021-03-09 R-Grove: Balanced Spatial Partitioning for Large-Scale Datasets Vu, Tin Eldawy, Ahmed Front Big Data Big Data The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R-Grove to outperform existing techniques in spatial query processing. R-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R-Grove will be adopted by the community to better serve big spatial data research. Frontiers Media S.A. 2020-08-28 /pmc/articles/PMC7931855/ /pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028 Text en Copyright © 2020 Vu and Eldawy. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Vu, Tin Eldawy, Ahmed R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title	R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_full	R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_fullStr	R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_full_unstemmed	R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_short	R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_sort	r*-grove: balanced spatial partitioning for large-scale datasets
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931855/ https://www.ncbi.nlm.nih.gov/pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028
work_keys_str_mv	AT vutin rgrovebalancedspatialpartitioningforlargescaledatasets AT eldawyahmed rgrovebalancedspatialpartitioningforlargescaledatasets

R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets

Ejemplares similares