Cargando…

Renormalization Analysis of Topic Models

In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Koltcov, Sergei, Ignatenko, Vera
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517079/
https://www.ncbi.nlm.nih.gov/pubmed/33286328
http://dx.doi.org/10.3390/e22050556
_version_ 1783587147495768064
author Koltcov, Sergei
Ignatenko, Vera
author_facet Koltcov, Sergei
Ignatenko, Vera
author_sort Koltcov, Sergei
collection PubMed
description In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation–Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality.
format Online
Article
Text
id pubmed-7517079
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75170792020-11-09 Renormalization Analysis of Topic Models Koltcov, Sergei Ignatenko, Vera Entropy (Basel) Article In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation–Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality. MDPI 2020-05-16 /pmc/articles/PMC7517079/ /pubmed/33286328 http://dx.doi.org/10.3390/e22050556 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Koltcov, Sergei
Ignatenko, Vera
Renormalization Analysis of Topic Models
title Renormalization Analysis of Topic Models
title_full Renormalization Analysis of Topic Models
title_fullStr Renormalization Analysis of Topic Models
title_full_unstemmed Renormalization Analysis of Topic Models
title_short Renormalization Analysis of Topic Models
title_sort renormalization analysis of topic models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7517079/
https://www.ncbi.nlm.nih.gov/pubmed/33286328
http://dx.doi.org/10.3390/e22050556
work_keys_str_mv AT koltcovsergei renormalizationanalysisoftopicmodels
AT ignatenkovera renormalizationanalysisoftopicmodels