Cargando…

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Wanting, Wang, Hanbin, Sun, Hongguang, Sun, Tieli
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749449/ https://www.ncbi.nlm.nih.gov/pubmed/31466389 http://dx.doi.org/10.3390/s19173728

_version_	1783452281478316032
author	Zhou, Wanting Wang, Hanbin Sun, Hongguang Sun, Tieli
author_facet	Zhou, Wanting Wang, Hanbin Sun, Hongguang Sun, Tieli
author_sort	Zhou, Wanting
collection	PubMed
description	Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method.
format	Online Article Text
id	pubmed-6749449
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-67494492019-09-27 A Method of Short Text Representation Based on the Feature Probability Embedded Vector Zhou, Wanting Wang, Hanbin Sun, Hongguang Sun, Tieli Sensors (Basel) Article Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method. MDPI 2019-08-28 /pmc/articles/PMC6749449/ /pubmed/31466389 http://dx.doi.org/10.3390/s19173728 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhou, Wanting Wang, Hanbin Sun, Hongguang Sun, Tieli A Method of Short Text Representation Based on the Feature Probability Embedded Vector
title	A Method of Short Text Representation Based on the Feature Probability Embedded Vector
title_full	A Method of Short Text Representation Based on the Feature Probability Embedded Vector
title_fullStr	A Method of Short Text Representation Based on the Feature Probability Embedded Vector
title_full_unstemmed	A Method of Short Text Representation Based on the Feature Probability Embedded Vector
title_short	A Method of Short Text Representation Based on the Feature Probability Embedded Vector
title_sort	method of short text representation based on the feature probability embedded vector
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749449/ https://www.ncbi.nlm.nih.gov/pubmed/31466389 http://dx.doi.org/10.3390/s19173728
work_keys_str_mv	AT zhouwanting amethodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT wanghanbin amethodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT sunhongguang amethodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT suntieli amethodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT zhouwanting methodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT wanghanbin methodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT sunhongguang methodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector AT suntieli methodofshorttextrepresentationbasedonthefeatureprobabilityembeddedvector

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

Ejemplares similares