Cargando…

A supervised topic embedding model and its application

We propose rTopicVec, a supervised topic embedding model that predicts response variables associated with documents by analyzing the text data. Topic modeling leverages document-level word co-occurrence patterns to learn latent topics of each document. While word embedding is a promising text analys...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Weiran, Eguchi, Koji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635756/
https://www.ncbi.nlm.nih.gov/pubmed/36331905
http://dx.doi.org/10.1371/journal.pone.0277104
_version_ 1784824780557385728
author Xu, Weiran
Eguchi, Koji
author_facet Xu, Weiran
Eguchi, Koji
author_sort Xu, Weiran
collection PubMed
description We propose rTopicVec, a supervised topic embedding model that predicts response variables associated with documents by analyzing the text data. Topic modeling leverages document-level word co-occurrence patterns to learn latent topics of each document. While word embedding is a promising text analysis technique in which words are mapped into a low-dimensional continuous semantic space by exploiting the local word co-occurrence patterns within a small context window. Recently developed topic embedding benefits from combining those two approaches by modeling latent topics in a word embedding space. Our proposed rTopicVec and its regularized variant incorporate regression into the topic embedding model to model each document and a numerical label paired with the document jointly. In addition, our models yield topics predictive of the response variables as well as predict response variables for unlabeled documents. We evaluated the effectiveness of our models through experiments on two regression tasks: predicting stock return rates using news articles provided by Thomson Reuters and predicting movie ratings using movie reviews. Results showed that the prediction performance of our models was more accurate in comparison to three baselines with a statistically significant difference.
format Online
Article
Text
id pubmed-9635756
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-96357562022-11-05 A supervised topic embedding model and its application Xu, Weiran Eguchi, Koji PLoS One Research Article We propose rTopicVec, a supervised topic embedding model that predicts response variables associated with documents by analyzing the text data. Topic modeling leverages document-level word co-occurrence patterns to learn latent topics of each document. While word embedding is a promising text analysis technique in which words are mapped into a low-dimensional continuous semantic space by exploiting the local word co-occurrence patterns within a small context window. Recently developed topic embedding benefits from combining those two approaches by modeling latent topics in a word embedding space. Our proposed rTopicVec and its regularized variant incorporate regression into the topic embedding model to model each document and a numerical label paired with the document jointly. In addition, our models yield topics predictive of the response variables as well as predict response variables for unlabeled documents. We evaluated the effectiveness of our models through experiments on two regression tasks: predicting stock return rates using news articles provided by Thomson Reuters and predicting movie ratings using movie reviews. Results showed that the prediction performance of our models was more accurate in comparison to three baselines with a statistically significant difference. Public Library of Science 2022-11-04 /pmc/articles/PMC9635756/ /pubmed/36331905 http://dx.doi.org/10.1371/journal.pone.0277104 Text en © 2022 Xu, Eguchi https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Xu, Weiran
Eguchi, Koji
A supervised topic embedding model and its application
title A supervised topic embedding model and its application
title_full A supervised topic embedding model and its application
title_fullStr A supervised topic embedding model and its application
title_full_unstemmed A supervised topic embedding model and its application
title_short A supervised topic embedding model and its application
title_sort supervised topic embedding model and its application
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635756/
https://www.ncbi.nlm.nih.gov/pubmed/36331905
http://dx.doi.org/10.1371/journal.pone.0277104
work_keys_str_mv AT xuweiran asupervisedtopicembeddingmodelanditsapplication
AT eguchikoji asupervisedtopicembeddingmodelanditsapplication
AT xuweiran supervisedtopicembeddingmodelanditsapplication
AT eguchikoji supervisedtopicembeddingmodelanditsapplication