Cargando…

A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark

Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-mem...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Jianbo, Ye, Zhuangzhuang, Zheng, Kai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827788/ https://www.ncbi.nlm.nih.gov/pubmed/33430375 http://dx.doi.org/10.3390/s21020365

_version_	1783640852718944256
author	Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai
author_facet	Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai
author_sort	Zhang, Jianbo
collection	PubMed
description	Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
format	Online Article Text
id	pubmed-7827788
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-78277882021-01-25 A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai Sensors (Basel) Article Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased. MDPI 2021-01-07 /pmc/articles/PMC7827788/ /pubmed/33430375 http://dx.doi.org/10.3390/s21020365 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title	A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_full	A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_fullStr	A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_full_unstemmed	A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_short	A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
title_sort	parallel computing approach to spatial neighboring analysis of large amounts of terrain data using spark
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827788/ https://www.ncbi.nlm.nih.gov/pubmed/33430375 http://dx.doi.org/10.3390/s21020365
work_keys_str_mv	AT zhangjianbo aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT yezhuangzhuang aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT zhengkai aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT zhangjianbo parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT yezhuangzhuang parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT zhengkai parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark

A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark

Ejemplares similares