Cargando…

An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop

Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D ras...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhipeng, Hua, Weihua, Liu, Xiuguo, Liang, Dong, Zhao, Yabo, Shi, Manxing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662431/
https://www.ncbi.nlm.nih.gov/pubmed/34884135
http://dx.doi.org/10.3390/s21238132
_version_ 1784613434905591808
author Liu, Zhipeng
Hua, Weihua
Liu, Xiuguo
Liang, Dong
Zhao, Yabo
Shi, Manxing
author_facet Liu, Zhipeng
Hua, Weihua
Liu, Xiuguo
Liang, Dong
Zhao, Yabo
Shi, Manxing
author_sort Liu, Zhipeng
collection PubMed
description Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D raster data has grown exponentially. In recent years, the processing of large raster data using Hadoop has gained popularity. However, data uploaded to Hadoop are randomly distributed onto datanodes without consideration of the spatial characteristics. As a result, the direct processing of geospatial 3D raster data produces a massive network data exchange among the datanodes and degrades the performance of the cluster. To address this problem, we propose an efficient group-based replica placement policy for large-scale geospatial 3D raster data, aiming to optimize the locations of the replicas in the cluster to reduce the network overhead. An overlapped group scheme was designed for three replicas of each file. The data in each group were placed in the same datanode, and different colocation patterns for three replicas were implemented to further reduce the communication between groups. The experimental results show that our approach significantly reduces the network overhead during data acquisition for 3D raster data in the Hadoop cluster, and maintains the Hadoop replica placement requirements.
format Online
Article
Text
id pubmed-8662431
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86624312021-12-11 An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop Liu, Zhipeng Hua, Weihua Liu, Xiuguo Liang, Dong Zhao, Yabo Shi, Manxing Sensors (Basel) Article Geospatial three-dimensional (3D) raster data have been widely used for simple representations and analysis, such as geological models, spatio-temporal satellite data, hyperspectral images, and climate data. With the increasing requirements of resolution and accuracy, the amount of geospatial 3D raster data has grown exponentially. In recent years, the processing of large raster data using Hadoop has gained popularity. However, data uploaded to Hadoop are randomly distributed onto datanodes without consideration of the spatial characteristics. As a result, the direct processing of geospatial 3D raster data produces a massive network data exchange among the datanodes and degrades the performance of the cluster. To address this problem, we propose an efficient group-based replica placement policy for large-scale geospatial 3D raster data, aiming to optimize the locations of the replicas in the cluster to reduce the network overhead. An overlapped group scheme was designed for three replicas of each file. The data in each group were placed in the same datanode, and different colocation patterns for three replicas were implemented to further reduce the communication between groups. The experimental results show that our approach significantly reduces the network overhead during data acquisition for 3D raster data in the Hadoop cluster, and maintains the Hadoop replica placement requirements. MDPI 2021-12-05 /pmc/articles/PMC8662431/ /pubmed/34884135 http://dx.doi.org/10.3390/s21238132 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Zhipeng
Hua, Weihua
Liu, Xiuguo
Liang, Dong
Zhao, Yabo
Shi, Manxing
An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_full An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_fullStr An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_full_unstemmed An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_short An Efficient Group-Based Replica Placement Policy for Large-Scale Geospatial 3D Raster Data on Hadoop
title_sort efficient group-based replica placement policy for large-scale geospatial 3d raster data on hadoop
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662431/
https://www.ncbi.nlm.nih.gov/pubmed/34884135
http://dx.doi.org/10.3390/s21238132
work_keys_str_mv AT liuzhipeng anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT huaweihua anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT liuxiuguo anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT liangdong anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT zhaoyabo anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT shimanxing anefficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT liuzhipeng efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT huaweihua efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT liuxiuguo efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT liangdong efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT zhaoyabo efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop
AT shimanxing efficientgroupbasedreplicaplacementpolicyforlargescalegeospatial3drasterdataonhadoop