Cargando…

Enhancement of Short Text Clustering by Iterative Classification

Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignmen...

Descripción completa

Detalles Bibliográficos
Autores principales: Rakib, Md Rashadul Hasan, Zeh, Norbert, Jankowska, Magdalena, Milios, Evangelos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298194/
http://dx.doi.org/10.1007/978-3-030-51310-8_10
_version_ 1783547167092244480
author Rakib, Md Rashadul Hasan
Zeh, Norbert
Jankowska, Magdalena
Milios, Evangelos
author_facet Rakib, Md Rashadul Hasan
Zeh, Norbert
Jankowska, Magdalena
Milios, Evangelos
author_sort Rakib, Md Rashadul Hasan
collection PubMed
description Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignment stabilizes. The classifier used in each iteration is trained using the current set of cluster labels of the non-outliers; the input of the first iteration is the output of an arbitrary clustering algorithm. Thus, our method does not require any human-annotated labels for training. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different baseline clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin.
format Online
Article
Text
id pubmed-7298194
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72981942020-06-17 Enhancement of Short Text Clustering by Iterative Classification Rakib, Md Rashadul Hasan Zeh, Norbert Jankowska, Magdalena Milios, Evangelos Natural Language Processing and Information Systems Article Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignment stabilizes. The classifier used in each iteration is trained using the current set of cluster labels of the non-outliers; the input of the first iteration is the output of an arbitrary clustering algorithm. Thus, our method does not require any human-annotated labels for training. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different baseline clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin. 2020-05-26 /pmc/articles/PMC7298194/ http://dx.doi.org/10.1007/978-3-030-51310-8_10 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Rakib, Md Rashadul Hasan
Zeh, Norbert
Jankowska, Magdalena
Milios, Evangelos
Enhancement of Short Text Clustering by Iterative Classification
title Enhancement of Short Text Clustering by Iterative Classification
title_full Enhancement of Short Text Clustering by Iterative Classification
title_fullStr Enhancement of Short Text Clustering by Iterative Classification
title_full_unstemmed Enhancement of Short Text Clustering by Iterative Classification
title_short Enhancement of Short Text Clustering by Iterative Classification
title_sort enhancement of short text clustering by iterative classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298194/
http://dx.doi.org/10.1007/978-3-030-51310-8_10
work_keys_str_mv AT rakibmdrashadulhasan enhancementofshorttextclusteringbyiterativeclassification
AT zehnorbert enhancementofshorttextclusteringbyiterativeclassification
AT jankowskamagdalena enhancementofshorttextclusteringbyiterativeclassification
AT miliosevangelos enhancementofshorttextclusteringbyiterativeclassification