Cargando…

PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and a...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Huiyu, Huang, Wei, Li, Ning, Zhou, Jianzhong, Zhang, Dongying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696378/
https://www.ncbi.nlm.nih.gov/pubmed/31387335
http://dx.doi.org/10.3390/s19153438
_version_ 1783444256544784384
author Xia, Huiyu
Huang, Wei
Li, Ning
Zhou, Jianzhong
Zhang, Dongying
author_facet Xia, Huiyu
Huang, Wei
Li, Ning
Zhou, Jianzhong
Zhang, Dongying
author_sort Xia, Huiyu
collection PubMed
description Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.
format Online
Article
Text
id pubmed-6696378
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-66963782019-09-05 PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data Xia, Huiyu Huang, Wei Li, Ning Zhou, Jianzhong Zhang, Dongying Sensors (Basel) Article Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD. MDPI 2019-08-05 /pmc/articles/PMC6696378/ /pubmed/31387335 http://dx.doi.org/10.3390/s19153438 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xia, Huiyu
Huang, Wei
Li, Ning
Zhou, Jianzhong
Zhang, Dongying
PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_full PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_fullStr PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_full_unstemmed PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_short PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_sort parsuc: a parallel subsampling-based method for clustering remote sensing big data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696378/
https://www.ncbi.nlm.nih.gov/pubmed/31387335
http://dx.doi.org/10.3390/s19153438
work_keys_str_mv AT xiahuiyu parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT huangwei parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT lining parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT zhoujianzhong parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT zhangdongying parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata