Cargando…

Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform

When facing high-dimensional data streams, clustering algorithms quickly reach the boundaries of their usefulness as most of these methods are not designed to deal with the curse of dimensionality. Due to inherent sparsity in high-dimensional data, distances between objects tend to become meaningles...

Descripción completa

Detalles Bibliográficos
Autores principales: Borutta, Felix, Kazempour, Daniyal, Mathy, Felix, Kröger, Peer, Seidl, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206268/
http://dx.doi.org/10.1007/978-3-030-47426-3_28
_version_ 1783530381930135552
author Borutta, Felix
Kazempour, Daniyal
Mathy, Felix
Kröger, Peer
Seidl, Thomas
author_facet Borutta, Felix
Kazempour, Daniyal
Mathy, Felix
Kröger, Peer
Seidl, Thomas
author_sort Borutta, Felix
collection PubMed
description When facing high-dimensional data streams, clustering algorithms quickly reach the boundaries of their usefulness as most of these methods are not designed to deal with the curse of dimensionality. Due to inherent sparsity in high-dimensional data, distances between objects tend to become meaningless since the distances between any two objects measured in the full dimensional space tend to become the same for all pairs of objects. In this work, we present a novel oriented subspace clustering algorithm that is able to deal with such issues and detects arbitrarily oriented subspace clusters in high-dimensional data streams. Data streams generally implicate the challenge that the data cannot be stored entirely and hence there is a general demand for suitable data handling strategies for clustering algorithms such that the data can be processed within a single scan. We therefore propose the CashStream algorithm that unites state-of-the-art stream processing techniques and additionally relies on the Hough transform to detect arbitrarily oriented subspace clusters. Our experiments compare CashStream to its static counterpart and show that the amount of consumed memory is significantly decreased while there is no loss in terms of runtime. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this chapter (10.1007/978-3-030-47426-3_28) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7206268
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72062682020-05-08 Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform Borutta, Felix Kazempour, Daniyal Mathy, Felix Kröger, Peer Seidl, Thomas Advances in Knowledge Discovery and Data Mining Article When facing high-dimensional data streams, clustering algorithms quickly reach the boundaries of their usefulness as most of these methods are not designed to deal with the curse of dimensionality. Due to inherent sparsity in high-dimensional data, distances between objects tend to become meaningless since the distances between any two objects measured in the full dimensional space tend to become the same for all pairs of objects. In this work, we present a novel oriented subspace clustering algorithm that is able to deal with such issues and detects arbitrarily oriented subspace clusters in high-dimensional data streams. Data streams generally implicate the challenge that the data cannot be stored entirely and hence there is a general demand for suitable data handling strategies for clustering algorithms such that the data can be processed within a single scan. We therefore propose the CashStream algorithm that unites state-of-the-art stream processing techniques and additionally relies on the Hough transform to detect arbitrarily oriented subspace clusters. Our experiments compare CashStream to its static counterpart and show that the amount of consumed memory is significantly decreased while there is no loss in terms of runtime. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this chapter (10.1007/978-3-030-47426-3_28) contains supplementary material, which is available to authorized users. 2020-04-17 /pmc/articles/PMC7206268/ http://dx.doi.org/10.1007/978-3-030-47426-3_28 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Borutta, Felix
Kazempour, Daniyal
Mathy, Felix
Kröger, Peer
Seidl, Thomas
Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
title Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
title_full Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
title_fullStr Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
title_full_unstemmed Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
title_short Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough Transform
title_sort detecting arbitrarily oriented subspace clusters in data streams using hough transform
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206268/
http://dx.doi.org/10.1007/978-3-030-47426-3_28
work_keys_str_mv AT boruttafelix detectingarbitrarilyorientedsubspaceclustersindatastreamsusinghoughtransform
AT kazempourdaniyal detectingarbitrarilyorientedsubspaceclustersindatastreamsusinghoughtransform
AT mathyfelix detectingarbitrarilyorientedsubspaceclustersindatastreamsusinghoughtransform
AT krogerpeer detectingarbitrarilyorientedsubspaceclustersindatastreamsusinghoughtransform
AT seidlthomas detectingarbitrarilyorientedsubspaceclustersindatastreamsusinghoughtransform