Cargando…
SOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501566/ https://www.ncbi.nlm.nih.gov/pubmed/28686655 http://dx.doi.org/10.1371/journal.pone.0180543 |
_version_ | 1783248808818245632 |
---|---|
author | Bryant, Avory C. Cios, Krzysztof J. |
author_facet | Bryant, Avory C. Cios, Krzysztof J. |
author_sort | Bryant, Avory C. |
collection | PubMed |
description | A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets. |
format | Online Article Text |
id | pubmed-5501566 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-55015662017-07-25 SOTXTSTREAM: Density-based self-organizing clustering of text streams Bryant, Avory C. Cios, Krzysztof J. PLoS One Research Article A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets. Public Library of Science 2017-07-07 /pmc/articles/PMC5501566/ /pubmed/28686655 http://dx.doi.org/10.1371/journal.pone.0180543 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Bryant, Avory C. Cios, Krzysztof J. SOTXTSTREAM: Density-based self-organizing clustering of text streams |
title | SOTXTSTREAM: Density-based self-organizing clustering of text streams |
title_full | SOTXTSTREAM: Density-based self-organizing clustering of text streams |
title_fullStr | SOTXTSTREAM: Density-based self-organizing clustering of text streams |
title_full_unstemmed | SOTXTSTREAM: Density-based self-organizing clustering of text streams |
title_short | SOTXTSTREAM: Density-based self-organizing clustering of text streams |
title_sort | sotxtstream: density-based self-organizing clustering of text streams |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501566/ https://www.ncbi.nlm.nih.gov/pubmed/28686655 http://dx.doi.org/10.1371/journal.pone.0180543 |
work_keys_str_mv | AT bryantavoryc sotxtstreamdensitybasedselforganizingclusteringoftextstreams AT cioskrzysztofj sotxtstreamdensitybasedselforganizingclusteringoftextstreams |