Cargando…

Topic modeling revisited: New evidence on algorithm performance and quality metrics

Topic modeling is a popular technique for exploring large document collections. It has proven useful for this task, but its application poses a number of challenges. First, the comparison of available algorithms is anything but simple, as researchers use many different datasets and criteria for thei...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rüdiger, Matthias, Antons, David, Joshi, Amol M., Salge, Torsten-Oliver
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9049322/ https://www.ncbi.nlm.nih.gov/pubmed/35482786 http://dx.doi.org/10.1371/journal.pone.0266325

_version_	1784696119086809088
author	Rüdiger, Matthias Antons, David Joshi, Amol M. Salge, Torsten-Oliver
author_facet	Rüdiger, Matthias Antons, David Joshi, Amol M. Salge, Torsten-Oliver
author_sort	Rüdiger, Matthias
collection	PubMed
description	Topic modeling is a popular technique for exploring large document collections. It has proven useful for this task, but its application poses a number of challenges. First, the comparison of available algorithms is anything but simple, as researchers use many different datasets and criteria for their evaluation. A second challenge is the choice of a suitable metric for evaluating the calculated results. The metrics used so far provide a mixed picture, making it difficult to verify the accuracy of topic modeling outputs. Altogether, the choice of an appropriate algorithm and the evaluation of the results remain unresolved issues. Although many studies have reported promising performance by various topic models, prior research has not yet systematically investigated the validity of the outcomes in a comprehensive manner, that is, using more than a small number of the available algorithms and metrics. Consequently, our study has two main objectives. First, we compare all commonly used, non-application-specific topic modeling algorithms and assess their relative performance. The comparison is made against a known clustering and thus enables an unbiased evaluation of results. Our findings show a clear ranking of the algorithms in terms of accuracy. Secondly, we analyze the relationship between existing metrics and the known clustering, and thus objectively determine under what conditions these algorithms may be utilized effectively. This way, we enable readers to gain a deeper understanding of the performance of topic modeling techniques and the interplay of performance and evaluation metrics.
format	Online Article Text
id	pubmed-9049322
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-90493222022-04-29 Topic modeling revisited: New evidence on algorithm performance and quality metrics Rüdiger, Matthias Antons, David Joshi, Amol M. Salge, Torsten-Oliver PLoS One Research Article Topic modeling is a popular technique for exploring large document collections. It has proven useful for this task, but its application poses a number of challenges. First, the comparison of available algorithms is anything but simple, as researchers use many different datasets and criteria for their evaluation. A second challenge is the choice of a suitable metric for evaluating the calculated results. The metrics used so far provide a mixed picture, making it difficult to verify the accuracy of topic modeling outputs. Altogether, the choice of an appropriate algorithm and the evaluation of the results remain unresolved issues. Although many studies have reported promising performance by various topic models, prior research has not yet systematically investigated the validity of the outcomes in a comprehensive manner, that is, using more than a small number of the available algorithms and metrics. Consequently, our study has two main objectives. First, we compare all commonly used, non-application-specific topic modeling algorithms and assess their relative performance. The comparison is made against a known clustering and thus enables an unbiased evaluation of results. Our findings show a clear ranking of the algorithms in terms of accuracy. Secondly, we analyze the relationship between existing metrics and the known clustering, and thus objectively determine under what conditions these algorithms may be utilized effectively. This way, we enable readers to gain a deeper understanding of the performance of topic modeling techniques and the interplay of performance and evaluation metrics. Public Library of Science 2022-04-28 /pmc/articles/PMC9049322/ /pubmed/35482786 http://dx.doi.org/10.1371/journal.pone.0266325 Text en © 2022 Rüdiger et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Rüdiger, Matthias Antons, David Joshi, Amol M. Salge, Torsten-Oliver Topic modeling revisited: New evidence on algorithm performance and quality metrics
title	Topic modeling revisited: New evidence on algorithm performance and quality metrics
title_full	Topic modeling revisited: New evidence on algorithm performance and quality metrics
title_fullStr	Topic modeling revisited: New evidence on algorithm performance and quality metrics
title_full_unstemmed	Topic modeling revisited: New evidence on algorithm performance and quality metrics
title_short	Topic modeling revisited: New evidence on algorithm performance and quality metrics
title_sort	topic modeling revisited: new evidence on algorithm performance and quality metrics
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9049322/ https://www.ncbi.nlm.nih.gov/pubmed/35482786 http://dx.doi.org/10.1371/journal.pone.0266325
work_keys_str_mv	AT rudigermatthias topicmodelingrevisitednewevidenceonalgorithmperformanceandqualitymetrics AT antonsdavid topicmodelingrevisitednewevidenceonalgorithmperformanceandqualitymetrics AT joshiamolm topicmodelingrevisitednewevidenceonalgorithmperformanceandqualitymetrics AT salgetorstenoliver topicmodelingrevisitednewevidenceonalgorithmperformanceandqualitymetrics

Topic modeling revisited: New evidence on algorithm performance and quality metrics

Ejemplares similares