Cargando…

Context-Aware Latent Dirichlet Allocation for Topic Segmentation

We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Wenbo, Matsukawa, Tetsu, Saigo, Hiroto, Suzuki, Einoshin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206242/
http://dx.doi.org/10.1007/978-3-030-47426-3_37
_version_ 1783530375941718016
author Li, Wenbo
Matsukawa, Tetsu
Saigo, Hiroto
Suzuki, Einoshin
author_facet Li, Wenbo
Matsukawa, Tetsu
Saigo, Hiroto
Suzuki, Einoshin
author_sort Li, Wenbo
collection PubMed
description We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most of the existing models either fuse topic segments by keywords or focus on modeling word co-occurrence patterns without merging. They can hardly achieve both coherency and saliency since many words have high uncertainties in topic assignments due to their polysemous nature. To solve this problem, we introduce topic-specific co-occurrence of word pairs within contexts in modeling, to generate more coherent segments and alleviate the influence of irrelevant words on topic assignment. We also design an optimization algorithm to eliminate redundant items in the generated topic segments. Experimental results show that our proposal produces significant improvements in both topic coherence and topic segmentation.
format Online
Article
Text
id pubmed-7206242
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72062422020-05-08 Context-Aware Latent Dirichlet Allocation for Topic Segmentation Li, Wenbo Matsukawa, Tetsu Saigo, Hiroto Suzuki, Einoshin Advances in Knowledge Discovery and Data Mining Article We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most of the existing models either fuse topic segments by keywords or focus on modeling word co-occurrence patterns without merging. They can hardly achieve both coherency and saliency since many words have high uncertainties in topic assignments due to their polysemous nature. To solve this problem, we introduce topic-specific co-occurrence of word pairs within contexts in modeling, to generate more coherent segments and alleviate the influence of irrelevant words on topic assignment. We also design an optimization algorithm to eliminate redundant items in the generated topic segments. Experimental results show that our proposal produces significant improvements in both topic coherence and topic segmentation. 2020-04-17 /pmc/articles/PMC7206242/ http://dx.doi.org/10.1007/978-3-030-47426-3_37 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Li, Wenbo
Matsukawa, Tetsu
Saigo, Hiroto
Suzuki, Einoshin
Context-Aware Latent Dirichlet Allocation for Topic Segmentation
title Context-Aware Latent Dirichlet Allocation for Topic Segmentation
title_full Context-Aware Latent Dirichlet Allocation for Topic Segmentation
title_fullStr Context-Aware Latent Dirichlet Allocation for Topic Segmentation
title_full_unstemmed Context-Aware Latent Dirichlet Allocation for Topic Segmentation
title_short Context-Aware Latent Dirichlet Allocation for Topic Segmentation
title_sort context-aware latent dirichlet allocation for topic segmentation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206242/
http://dx.doi.org/10.1007/978-3-030-47426-3_37
work_keys_str_mv AT liwenbo contextawarelatentdirichletallocationfortopicsegmentation
AT matsukawatetsu contextawarelatentdirichletallocationfortopicsegmentation
AT saigohiroto contextawarelatentdirichletallocationfortopicsegmentation
AT suzukieinoshin contextawarelatentdirichletallocationfortopicsegmentation