Cargando…
R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931855/ https://www.ncbi.nlm.nih.gov/pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028 |
_version_ | 1783660367883272192 |
---|---|
author | Vu, Tin Eldawy, Ahmed |
author_facet | Vu, Tin Eldawy, Ahmed |
author_sort | Vu, Tin |
collection | PubMed |
description | The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research. |
format | Online Article Text |
id | pubmed-7931855 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79318552021-03-09 R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets Vu, Tin Eldawy, Ahmed Front Big Data Big Data The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research. Frontiers Media S.A. 2020-08-28 /pmc/articles/PMC7931855/ /pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028 Text en Copyright © 2020 Vu and Eldawy. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Vu, Tin Eldawy, Ahmed R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets |
title | R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets |
title_full | R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets |
title_fullStr | R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets |
title_full_unstemmed | R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets |
title_short | R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets |
title_sort | r*-grove: balanced spatial partitioning for large-scale datasets |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931855/ https://www.ncbi.nlm.nih.gov/pubmed/33693401 http://dx.doi.org/10.3389/fdata.2020.00028 |
work_keys_str_mv | AT vutin rgrovebalancedspatialpartitioningforlargescaledatasets AT eldawyahmed rgrovebalancedspatialpartitioningforlargescaledatasets |