Cargando…

Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype

A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small...

Descripción completa

Detalles Bibliográficos
Autores principales: Rieger, Jonas, Rahnenführer, Jörg, Jentsch, Carsten
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298183/
http://dx.doi.org/10.1007/978-3-030-51310-8_11
_version_ 1783547164244312064
author Rieger, Jonas
Rahnenführer, Jörg
Jentsch, Carsten
author_facet Rieger, Jonas
Rahnenführer, Jörg
Jentsch, Carsten
author_sort Rieger, Jonas
collection PubMed
description A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small set of LDAs using heuristic means or human codings. Then, conclusions are often drawn based on the to some extent arbitrarily selected model. We present the novel method LDAPrototype, which takes the instability of the LDA into account, and show that by systematically selecting an LDA it improves the reliability of the conclusions drawn from the result and thus provides better reproducibility. The improvement coming from this selection criterion is unveiled by applying the proposed methods to an example corpus consisting of texts published in a German quality newspaper over one month.
format Online
Article
Text
id pubmed-7298183
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72981832020-06-17 Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype Rieger, Jonas Rahnenführer, Jörg Jentsch, Carsten Natural Language Processing and Information Systems Article A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small set of LDAs using heuristic means or human codings. Then, conclusions are often drawn based on the to some extent arbitrarily selected model. We present the novel method LDAPrototype, which takes the instability of the LDA into account, and show that by systematically selecting an LDA it improves the reliability of the conclusions drawn from the result and thus provides better reproducibility. The improvement coming from this selection criterion is unveiled by applying the proposed methods to an example corpus consisting of texts published in a German quality newspaper over one month. 2020-05-26 /pmc/articles/PMC7298183/ http://dx.doi.org/10.1007/978-3-030-51310-8_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Rieger, Jonas
Rahnenführer, Jörg
Jentsch, Carsten
Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
title Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
title_full Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
title_fullStr Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
title_full_unstemmed Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
title_short Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
title_sort improving latent dirichlet allocation: on reliability of the novel method ldaprototype
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298183/
http://dx.doi.org/10.1007/978-3-030-51310-8_11
work_keys_str_mv AT riegerjonas improvinglatentdirichletallocationonreliabilityofthenovelmethodldaprototype
AT rahnenfuhrerjorg improvinglatentdirichletallocationonreliabilityofthenovelmethodldaprototype
AT jentschcarsten improvinglatentdirichletallocationonreliabilityofthenovelmethodldaprototype