Cargando…

A Topic Recognition Method of News Text Based on Word Embedding Enhancement

Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to e...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Qiming, Li, Nan, Liu, Wenfu, Sun, Daozhu, Yang, Shudan, Yue, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8865979/
https://www.ncbi.nlm.nih.gov/pubmed/35222628
http://dx.doi.org/10.1155/2022/4582480
_version_ 1784655736014372864
author Du, Qiming
Li, Nan
Liu, Wenfu
Sun, Daozhu
Yang, Shudan
Yue, Feng
author_facet Du, Qiming
Li, Nan
Liu, Wenfu
Sun, Daozhu
Yang, Shudan
Yue, Feng
author_sort Du, Qiming
collection PubMed
description Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy. Some researchers proposed to combine the topic model with the word embedding model, whose results had shown that this approach could enrich text representation and benefit natural language processing downstream tasks. However, for the topic recognition problem of news texts, there is currently no standard way of combining topic model and word embedding model. Besides, some existing similar approaches were more complex and did not consider the fusion between topic distribution of different granularity and word embedding information. Therefore, this paper proposes a novel text representation method based on word embedding enhancement and further forms a full-process topic recognition framework for news text. In contrast to traditional topic recognition methods, this framework is designed to use the probabilistic topic model LDA, the word embedding models Word2vec and Glove to fully extract and integrate the topic distribution, semantic knowledge, and syntactic relationship of the text, and then use popular classifiers to automatically recognize the topic categories of news based on the obtained text representation vectors. As a result, the proposed framework can take advantage of the relationship between document and topic and the context information, which improves the expressive ability and reduces the dimensionality. Based on the two benchmark datasets of 20NewsGroup and BBC News, the experimental results verify the effectiveness and superiority of the proposed method based on word embedding enhancement for the news topic recognition problem.
format Online
Article
Text
id pubmed-8865979
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-88659792022-02-24 A Topic Recognition Method of News Text Based on Word Embedding Enhancement Du, Qiming Li, Nan Liu, Wenfu Sun, Daozhu Yang, Shudan Yue, Feng Comput Intell Neurosci Research Article Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy. Some researchers proposed to combine the topic model with the word embedding model, whose results had shown that this approach could enrich text representation and benefit natural language processing downstream tasks. However, for the topic recognition problem of news texts, there is currently no standard way of combining topic model and word embedding model. Besides, some existing similar approaches were more complex and did not consider the fusion between topic distribution of different granularity and word embedding information. Therefore, this paper proposes a novel text representation method based on word embedding enhancement and further forms a full-process topic recognition framework for news text. In contrast to traditional topic recognition methods, this framework is designed to use the probabilistic topic model LDA, the word embedding models Word2vec and Glove to fully extract and integrate the topic distribution, semantic knowledge, and syntactic relationship of the text, and then use popular classifiers to automatically recognize the topic categories of news based on the obtained text representation vectors. As a result, the proposed framework can take advantage of the relationship between document and topic and the context information, which improves the expressive ability and reduces the dimensionality. Based on the two benchmark datasets of 20NewsGroup and BBC News, the experimental results verify the effectiveness and superiority of the proposed method based on word embedding enhancement for the news topic recognition problem. Hindawi 2022-02-16 /pmc/articles/PMC8865979/ /pubmed/35222628 http://dx.doi.org/10.1155/2022/4582480 Text en Copyright © 2022 Qiming Du et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Du, Qiming
Li, Nan
Liu, Wenfu
Sun, Daozhu
Yang, Shudan
Yue, Feng
A Topic Recognition Method of News Text Based on Word Embedding Enhancement
title A Topic Recognition Method of News Text Based on Word Embedding Enhancement
title_full A Topic Recognition Method of News Text Based on Word Embedding Enhancement
title_fullStr A Topic Recognition Method of News Text Based on Word Embedding Enhancement
title_full_unstemmed A Topic Recognition Method of News Text Based on Word Embedding Enhancement
title_short A Topic Recognition Method of News Text Based on Word Embedding Enhancement
title_sort topic recognition method of news text based on word embedding enhancement
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8865979/
https://www.ncbi.nlm.nih.gov/pubmed/35222628
http://dx.doi.org/10.1155/2022/4582480
work_keys_str_mv AT duqiming atopicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT linan atopicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT liuwenfu atopicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT sundaozhu atopicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT yangshudan atopicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT yuefeng atopicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT duqiming topicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT linan topicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT liuwenfu topicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT sundaozhu topicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT yangshudan topicrecognitionmethodofnewstextbasedonwordembeddingenhancement
AT yuefeng topicrecognitionmethodofnewstextbasedonwordembeddingenhancement