Cargando…

R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets

The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial...

Descripción completa

Detalles Bibliográficos
Autores principales: Vu, Tin, Eldawy, Ahmed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931855/
https://www.ncbi.nlm.nih.gov/pubmed/33693401
http://dx.doi.org/10.3389/fdata.2020.00028
_version_ 1783660367883272192
author Vu, Tin
Eldawy, Ahmed
author_facet Vu, Tin
Eldawy, Ahmed
author_sort Vu, Tin
collection PubMed
description The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.
format Online
Article
Text
id pubmed-7931855
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79318552021-03-09 R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets Vu, Tin Eldawy, Ahmed Front Big Data Big Data The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research. Frontiers Media S.A. 2020-08-28 /pmc/articles/PMC7931855/ /pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028 Text en Copyright © 2020 Vu and Eldawy. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Vu, Tin
Eldawy, Ahmed
R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_full R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_fullStr R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_full_unstemmed R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_short R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
title_sort r*-grove: balanced spatial partitioning for large-scale datasets
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931855/
https://www.ncbi.nlm.nih.gov/pubmed/33693401
http://dx.doi.org/10.3389/fdata.2020.00028
work_keys_str_mv AT vutin rgrovebalancedspatialpartitioningforlargescaledatasets
AT eldawyahmed rgrovebalancedspatialpartitioningforlargescaledatasets