Cargando…
Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype
A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298183/ http://dx.doi.org/10.1007/978-3-030-51310-8_11 |
_version_ | 1783547164244312064 |
---|---|
author | Rieger, Jonas Rahnenführer, Jörg Jentsch, Carsten |
author_facet | Rieger, Jonas Rahnenführer, Jörg Jentsch, Carsten |
author_sort | Rieger, Jonas |
collection | PubMed |
description | A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small set of LDAs using heuristic means or human codings. Then, conclusions are often drawn based on the to some extent arbitrarily selected model. We present the novel method LDAPrototype, which takes the instability of the LDA into account, and show that by systematically selecting an LDA it improves the reliability of the conclusions drawn from the result and thus provides better reproducibility. The improvement coming from this selection criterion is unveiled by applying the proposed methods to an example corpus consisting of texts published in a German quality newspaper over one month. |
format | Online Article Text |
id | pubmed-7298183 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72981832020-06-17 Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype Rieger, Jonas Rahnenführer, Jörg Jentsch, Carsten Natural Language Processing and Information Systems Article A large number of applications in text data analysis use the Latent Dirichlet Allocation (LDA) as one of the most popular methods in topic modeling. Although the instability of the LDA is mentioned sometimes, it is usually not considered systematically. Instead, an LDA is often selected from a small set of LDAs using heuristic means or human codings. Then, conclusions are often drawn based on the to some extent arbitrarily selected model. We present the novel method LDAPrototype, which takes the instability of the LDA into account, and show that by systematically selecting an LDA it improves the reliability of the conclusions drawn from the result and thus provides better reproducibility. The improvement coming from this selection criterion is unveiled by applying the proposed methods to an example corpus consisting of texts published in a German quality newspaper over one month. 2020-05-26 /pmc/articles/PMC7298183/ http://dx.doi.org/10.1007/978-3-030-51310-8_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Rieger, Jonas Rahnenführer, Jörg Jentsch, Carsten Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype |
title | Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype |
title_full | Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype |
title_fullStr | Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype |
title_full_unstemmed | Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype |
title_short | Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype |
title_sort | improving latent dirichlet allocation: on reliability of the novel method ldaprototype |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298183/ http://dx.doi.org/10.1007/978-3-030-51310-8_11 |
work_keys_str_mv | AT riegerjonas improvinglatentdirichletallocationonreliabilityofthenovelmethodldaprototype AT rahnenfuhrerjorg improvinglatentdirichletallocationonreliabilityofthenovelmethodldaprototype AT jentschcarsten improvinglatentdirichletallocationonreliabilityofthenovelmethodldaprototype |