Cargando…

An incremental clustering method based on the boundary profile

Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is be...

Descripción completa

Detalles Bibliográficos
Autores principales: Bao, Junpeng, Wang, Wenqing, Yang, Tianshe, Wu, Guan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909898/
https://www.ncbi.nlm.nih.gov/pubmed/29677201
http://dx.doi.org/10.1371/journal.pone.0196108
_version_ 1783315968136577024
author Bao, Junpeng
Wang, Wenqing
Yang, Tianshe
Wu, Guan
author_facet Bao, Junpeng
Wang, Wenqing
Yang, Tianshe
Wu, Guan
author_sort Bao, Junpeng
collection PubMed
description Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is better to incrementally update the new clustering results based on the old data rather than to recluster all of the data from scratch. The incremental clustering approach is an essential way to solve the problem of clustering with growing Big Data. This paper proposes a boundary-profile-based incremental clustering (BPIC) method to find arbitrarily shaped clusters with dynamically growing datasets. This method represents the existing clustering results with a collection of boundary profiles and discards the inner points of clusters rather than keep all data. It greatly saves both time and space storage costs. To identify the boundary profile, this paper presents a boundary-vector-based boundary point detection (BV-BPD) algorithm that summarizes the structure of the existing clusters. The BPIC method processes each new point in an online fashion and updates the clustering results in a batch mode. When a new point arrives, the BPIC method either immediately labels it or temporarily puts it into a bucket according to the relationship between the new data and the boundary profiles. A bucket is employed to distinguish the noise from the potential seeds of new clusters and alleviate the effects of data order. When the bucket is full, the BPIC method will cluster the data within it and update the clustering results. Thus, the BPIC method is insensitive to noise and the order of new data, which is critical for the robustness of the incremental clustering process. In the experiments, the performance of the boundary point detection algorithm BV-BPD is compared with the state-of-the-art method. The results show that the BV-BPD is better than the state-of-the-art method. Additionally, the performance of BPIC and other two incremental clustering methods are investigated in terms of clustering quality, time and space efficiency. The experimental results indicate that the BPIC method is able to get a qualified clustering result on a large dataset with higher time and space efficiency.
format Online
Article
Text
id pubmed-5909898
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59098982018-05-05 An incremental clustering method based on the boundary profile Bao, Junpeng Wang, Wenqing Yang, Tianshe Wu, Guan PLoS One Research Article Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is better to incrementally update the new clustering results based on the old data rather than to recluster all of the data from scratch. The incremental clustering approach is an essential way to solve the problem of clustering with growing Big Data. This paper proposes a boundary-profile-based incremental clustering (BPIC) method to find arbitrarily shaped clusters with dynamically growing datasets. This method represents the existing clustering results with a collection of boundary profiles and discards the inner points of clusters rather than keep all data. It greatly saves both time and space storage costs. To identify the boundary profile, this paper presents a boundary-vector-based boundary point detection (BV-BPD) algorithm that summarizes the structure of the existing clusters. The BPIC method processes each new point in an online fashion and updates the clustering results in a batch mode. When a new point arrives, the BPIC method either immediately labels it or temporarily puts it into a bucket according to the relationship between the new data and the boundary profiles. A bucket is employed to distinguish the noise from the potential seeds of new clusters and alleviate the effects of data order. When the bucket is full, the BPIC method will cluster the data within it and update the clustering results. Thus, the BPIC method is insensitive to noise and the order of new data, which is critical for the robustness of the incremental clustering process. In the experiments, the performance of the boundary point detection algorithm BV-BPD is compared with the state-of-the-art method. The results show that the BV-BPD is better than the state-of-the-art method. Additionally, the performance of BPIC and other two incremental clustering methods are investigated in terms of clustering quality, time and space efficiency. The experimental results indicate that the BPIC method is able to get a qualified clustering result on a large dataset with higher time and space efficiency. Public Library of Science 2018-04-20 /pmc/articles/PMC5909898/ /pubmed/29677201 http://dx.doi.org/10.1371/journal.pone.0196108 Text en © 2018 Bao et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bao, Junpeng
Wang, Wenqing
Yang, Tianshe
Wu, Guan
An incremental clustering method based on the boundary profile
title An incremental clustering method based on the boundary profile
title_full An incremental clustering method based on the boundary profile
title_fullStr An incremental clustering method based on the boundary profile
title_full_unstemmed An incremental clustering method based on the boundary profile
title_short An incremental clustering method based on the boundary profile
title_sort incremental clustering method based on the boundary profile
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909898/
https://www.ncbi.nlm.nih.gov/pubmed/29677201
http://dx.doi.org/10.1371/journal.pone.0196108
work_keys_str_mv AT baojunpeng anincrementalclusteringmethodbasedontheboundaryprofile
AT wangwenqing anincrementalclusteringmethodbasedontheboundaryprofile
AT yangtianshe anincrementalclusteringmethodbasedontheboundaryprofile
AT wuguan anincrementalclusteringmethodbasedontheboundaryprofile
AT baojunpeng incrementalclusteringmethodbasedontheboundaryprofile
AT wangwenqing incrementalclusteringmethodbasedontheboundaryprofile
AT yangtianshe incrementalclusteringmethodbasedontheboundaryprofile
AT wuguan incrementalclusteringmethodbasedontheboundaryprofile