Cargando…

A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark

Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-mem...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jianbo, Ye, Zhuangzhuang, Zheng, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827788/
https://www.ncbi.nlm.nih.gov/pubmed/33430375
http://dx.doi.org/10.3390/s21020365
_version_ 1783640852718944256
author Zhang, Jianbo
Ye, Zhuangzhuang
Zheng, Kai
author_facet Zhang, Jianbo
Ye, Zhuangzhuang
Zheng, Kai
author_sort Zhang, Jianbo
collection PubMed
description Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
format Online
Article
Text
id pubmed-7827788
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-78277882021-01-25 A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai Sensors (Basel) Article Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased. MDPI 2021-01-07 /pmc/articles/PMC7827788/ /pubmed/33430375 http://dx.doi.org/10.3390/s21020365 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Jianbo
Ye, Zhuangzhuang
Zheng, Kai
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_full A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_fullStr A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_full_unstemmed A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_short A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_sort parallel computing approach to spatial neighboring analysis of large amounts of terrain data using spark
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827788/
https://www.ncbi.nlm.nih.gov/pubmed/33430375
http://dx.doi.org/10.3390/s21020365
work_keys_str_mv AT zhangjianbo aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark
AT yezhuangzhuang aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark
AT zhengkai aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark
AT zhangjianbo parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark
AT yezhuangzhuang parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark
AT zhengkai parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark