Cargando…

Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example

This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number...

Descripción completa

Detalles Bibliográficos
Autores principales: Gan, Jingxian, Qi, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534395/
https://www.ncbi.nlm.nih.gov/pubmed/34682025
http://dx.doi.org/10.3390/e23101301
_version_ 1784587542439395328
author Gan, Jingxian
Qi, Yong
author_facet Gan, Jingxian
Qi, Yong
author_sort Gan, Jingxian
collection PubMed
description This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number of topics. This method provides four advantages to selecting the optimal number of topics: (1) good predictive ability, (2) high isolation between topics, (3) no duplicate topics, and (4) repeatability. First, we use three general datasets to compare our proposed method with existing methods, and the results show that the optimal topic number selection method has better selection results. Then, we collected the patent policies of various provinces and cities in China (excluding Hong Kong, Macao, and Taiwan) as datasets. By using the optimal topic number selection method proposed in this study, we can classify patent policies well.
format Online
Article
Text
id pubmed-8534395
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85343952021-10-23 Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example Gan, Jingxian Qi, Yong Entropy (Basel) Article This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number of topics. This method provides four advantages to selecting the optimal number of topics: (1) good predictive ability, (2) high isolation between topics, (3) no duplicate topics, and (4) repeatability. First, we use three general datasets to compare our proposed method with existing methods, and the results show that the optimal topic number selection method has better selection results. Then, we collected the patent policies of various provinces and cities in China (excluding Hong Kong, Macao, and Taiwan) as datasets. By using the optimal topic number selection method proposed in this study, we can classify patent policies well. MDPI 2021-10-03 /pmc/articles/PMC8534395/ /pubmed/34682025 http://dx.doi.org/10.3390/e23101301 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gan, Jingxian
Qi, Yong
Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example
title Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example
title_full Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example
title_fullStr Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example
title_full_unstemmed Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example
title_short Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example
title_sort selection of the optimal number of topics for lda topic model—taking patent policy analysis as an example
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534395/
https://www.ncbi.nlm.nih.gov/pubmed/34682025
http://dx.doi.org/10.3390/e23101301
work_keys_str_mv AT ganjingxian selectionoftheoptimalnumberoftopicsforldatopicmodeltakingpatentpolicyanalysisasanexample
AT qiyong selectionoftheoptimalnumberoftopicsforldatopicmodeltakingpatentpolicyanalysisasanexample