Cargando…
A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark
Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-mem...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827788/ https://www.ncbi.nlm.nih.gov/pubmed/33430375 http://dx.doi.org/10.3390/s21020365 |
_version_ | 1783640852718944256 |
---|---|
author | Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai |
author_facet | Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai |
author_sort | Zhang, Jianbo |
collection | PubMed |
description | Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased. |
format | Online Article Text |
id | pubmed-7827788 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-78277882021-01-25 A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai Sensors (Basel) Article Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased. MDPI 2021-01-07 /pmc/articles/PMC7827788/ /pubmed/33430375 http://dx.doi.org/10.3390/s21020365 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Jianbo Ye, Zhuangzhuang Zheng, Kai A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark |
title | A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark |
title_full | A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark |
title_fullStr | A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark |
title_full_unstemmed | A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark |
title_short | A Parallel Computing Approach to Spatial Neighboring Analysis of Large Amounts of Terrain Data Using Spark |
title_sort | parallel computing approach to spatial neighboring analysis of large amounts of terrain data using spark |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827788/ https://www.ncbi.nlm.nih.gov/pubmed/33430375 http://dx.doi.org/10.3390/s21020365 |
work_keys_str_mv | AT zhangjianbo aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT yezhuangzhuang aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT zhengkai aparallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT zhangjianbo parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT yezhuangzhuang parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark AT zhengkai parallelcomputingapproachtospatialneighboringanalysisoflargeamountsofterraindatausingspark |