Cargando…
PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696378/ https://www.ncbi.nlm.nih.gov/pubmed/31387335 http://dx.doi.org/10.3390/s19153438 |
_version_ | 1783444256544784384 |
---|---|
author | Xia, Huiyu Huang, Wei Li, Ning Zhou, Jianzhong Zhang, Dongying |
author_facet | Xia, Huiyu Huang, Wei Li, Ning Zhou, Jianzhong Zhang, Dongying |
author_sort | Xia, Huiyu |
collection | PubMed |
description | Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD. |
format | Online Article Text |
id | pubmed-6696378 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-66963782019-09-05 PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data Xia, Huiyu Huang, Wei Li, Ning Zhou, Jianzhong Zhang, Dongying Sensors (Basel) Article Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD. MDPI 2019-08-05 /pmc/articles/PMC6696378/ /pubmed/31387335 http://dx.doi.org/10.3390/s19153438 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Xia, Huiyu Huang, Wei Li, Ning Zhou, Jianzhong Zhang, Dongying PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_full | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_fullStr | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_full_unstemmed | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_short | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_sort | parsuc: a parallel subsampling-based method for clustering remote sensing big data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696378/ https://www.ncbi.nlm.nih.gov/pubmed/31387335 http://dx.doi.org/10.3390/s19153438 |
work_keys_str_mv | AT xiahuiyu parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT huangwei parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT lining parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT zhoujianzhong parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT zhangdongying parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata |