Cargando…

Seed-Guided Deep Document Clustering

Different users may be interested in different clustering views underlying a given collection (e.g., topic and writing style in documents). Enabling them to provide constraints reflecting their needs can then help obtain tailored clustering results. For document clustering, constraints can be provid...

Descripción completa

Detalles Bibliográficos
Autores principales: Fard, Mazar Moradi, Thonet, Thibaut, Gaussier, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148236/
http://dx.doi.org/10.1007/978-3-030-45439-5_1
_version_ 1783520550286524416
author Fard, Mazar Moradi
Thonet, Thibaut
Gaussier, Eric
author_facet Fard, Mazar Moradi
Thonet, Thibaut
Gaussier, Eric
author_sort Fard, Mazar Moradi
collection PubMed
description Different users may be interested in different clustering views underlying a given collection (e.g., topic and writing style in documents). Enabling them to provide constraints reflecting their needs can then help obtain tailored clustering results. For document clustering, constraints can be provided in the form of seed words, each cluster being characterized by a small set of words. This seed-guided constrained document clustering problem was recently addressed through topic modeling approaches. In this paper, we jointly learn deep representations and bias the clustering results through the seed words, leading to a Seed-guided Deep Document Clustering approach. Its effectiveness is demonstrated on five public datasets.
format Online
Article
Text
id pubmed-7148236
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71482362020-04-13 Seed-Guided Deep Document Clustering Fard, Mazar Moradi Thonet, Thibaut Gaussier, Eric Advances in Information Retrieval Article Different users may be interested in different clustering views underlying a given collection (e.g., topic and writing style in documents). Enabling them to provide constraints reflecting their needs can then help obtain tailored clustering results. For document clustering, constraints can be provided in the form of seed words, each cluster being characterized by a small set of words. This seed-guided constrained document clustering problem was recently addressed through topic modeling approaches. In this paper, we jointly learn deep representations and bias the clustering results through the seed words, leading to a Seed-guided Deep Document Clustering approach. Its effectiveness is demonstrated on five public datasets. 2020-03-17 /pmc/articles/PMC7148236/ http://dx.doi.org/10.1007/978-3-030-45439-5_1 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Fard, Mazar Moradi
Thonet, Thibaut
Gaussier, Eric
Seed-Guided Deep Document Clustering
title Seed-Guided Deep Document Clustering
title_full Seed-Guided Deep Document Clustering
title_fullStr Seed-Guided Deep Document Clustering
title_full_unstemmed Seed-Guided Deep Document Clustering
title_short Seed-Guided Deep Document Clustering
title_sort seed-guided deep document clustering
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148236/
http://dx.doi.org/10.1007/978-3-030-45439-5_1
work_keys_str_mv AT fardmazarmoradi seedguideddeepdocumentclustering
AT thonetthibaut seedguideddeepdocumentclustering
AT gaussiereric seedguideddeepdocumentclustering