Cargando…

SOTXTSTREAM: Density-based self-organizing clustering of text streams

A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through...

Descripción completa

Detalles Bibliográficos
Autores principales: Bryant, Avory C., Cios, Krzysztof J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501566/
https://www.ncbi.nlm.nih.gov/pubmed/28686655
http://dx.doi.org/10.1371/journal.pone.0180543
_version_ 1783248808818245632
author Bryant, Avory C.
Cios, Krzysztof J.
author_facet Bryant, Avory C.
Cios, Krzysztof J.
author_sort Bryant, Avory C.
collection PubMed
description A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
format Online
Article
Text
id pubmed-5501566
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55015662017-07-25 SOTXTSTREAM: Density-based self-organizing clustering of text streams Bryant, Avory C. Cios, Krzysztof J. PLoS One Research Article A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets. Public Library of Science 2017-07-07 /pmc/articles/PMC5501566/ /pubmed/28686655 http://dx.doi.org/10.1371/journal.pone.0180543 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Bryant, Avory C.
Cios, Krzysztof J.
SOTXTSTREAM: Density-based self-organizing clustering of text streams
title SOTXTSTREAM: Density-based self-organizing clustering of text streams
title_full SOTXTSTREAM: Density-based self-organizing clustering of text streams
title_fullStr SOTXTSTREAM: Density-based self-organizing clustering of text streams
title_full_unstemmed SOTXTSTREAM: Density-based self-organizing clustering of text streams
title_short SOTXTSTREAM: Density-based self-organizing clustering of text streams
title_sort sotxtstream: density-based self-organizing clustering of text streams
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501566/
https://www.ncbi.nlm.nih.gov/pubmed/28686655
http://dx.doi.org/10.1371/journal.pone.0180543
work_keys_str_mv AT bryantavoryc sotxtstreamdensitybasedselforganizingclusteringoftextstreams
AT cioskrzysztofj sotxtstreamdensitybasedselforganizingclusteringoftextstreams