Cargando…

Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform

Text interpretation of public English vocabulary is a critical task in the subject of natural language processing, which uses technology to allow humans and computers to communicate effectively using natural language. Text feature extraction is one of the most fundamental and crucial elements in all...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ye, Di, Shi, Xiaojing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206561/ https://www.ncbi.nlm.nih.gov/pubmed/35726227 http://dx.doi.org/10.1155/2022/7125242

_version_	1784729356812156928
author	Ye, Di Shi, Xiaojing
author_facet	Ye, Di Shi, Xiaojing
author_sort	Ye, Di
collection	PubMed
description	Text interpretation of public English vocabulary is a critical task in the subject of natural language processing, which uses technology to allow humans and computers to communicate effectively using natural language. Text feature extraction is one of the most fundamental and crucial elements in allowing computers to effectively grasp and read text. This paper proposes a text feature extraction method based on wavelet analysis that performs fast discrete wavelet transform and inverse discrete wavelet transform on the feature vectors under the traditional TF-IDF vector space model to address the problem of low feature differentiation of high-dimensional data in text feature extraction. In particular, due to the design of the Mallat algorithm, there is frequency aliasing in the signal decomposition process. This phenomenon is a problem that cannot be ignored when using wavelet analysis for feature extraction. Therefore, this paper proposes an improved inverse discrete wavelet transform method, in which the signal is decomposed by Mallat algorithm to obtain wavelet coefficients at each scale and then reconstructed to the required wavelet space coefficients according to the reconstruction method, and the reconstructed coefficients are used to analyze the signal at that scale instead of the wavelet coefficients obtained at the corresponding scale. Experiments on the public English vocabulary dataset reveal that the wavelet transform-based strategy suggested in this research outperforms existing feature extraction methods while maintaining greater classification accuracy while reducing the dimensionality of the TF-IDF vector space model.
format	Online Article Text
id	pubmed-9206561
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-92065612022-06-19 Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform Ye, Di Shi, Xiaojing Comput Math Methods Med Research Article Text interpretation of public English vocabulary is a critical task in the subject of natural language processing, which uses technology to allow humans and computers to communicate effectively using natural language. Text feature extraction is one of the most fundamental and crucial elements in allowing computers to effectively grasp and read text. This paper proposes a text feature extraction method based on wavelet analysis that performs fast discrete wavelet transform and inverse discrete wavelet transform on the feature vectors under the traditional TF-IDF vector space model to address the problem of low feature differentiation of high-dimensional data in text feature extraction. In particular, due to the design of the Mallat algorithm, there is frequency aliasing in the signal decomposition process. This phenomenon is a problem that cannot be ignored when using wavelet analysis for feature extraction. Therefore, this paper proposes an improved inverse discrete wavelet transform method, in which the signal is decomposed by Mallat algorithm to obtain wavelet coefficients at each scale and then reconstructed to the required wavelet space coefficients according to the reconstruction method, and the reconstructed coefficients are used to analyze the signal at that scale instead of the wavelet coefficients obtained at the corresponding scale. Experiments on the public English vocabulary dataset reveal that the wavelet transform-based strategy suggested in this research outperforms existing feature extraction methods while maintaining greater classification accuracy while reducing the dimensionality of the TF-IDF vector space model. Hindawi 2022-06-11 /pmc/articles/PMC9206561/ /pubmed/35726227 http://dx.doi.org/10.1155/2022/7125242 Text en Copyright © 2022 Di Ye and Xiaojing Shi. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Ye, Di Shi, Xiaojing Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title	Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_full	Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_fullStr	Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_full_unstemmed	Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_short	Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_sort	text feature extraction for public english vocabulary based on wavelet transform
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206561/ https://www.ncbi.nlm.nih.gov/pubmed/35726227 http://dx.doi.org/10.1155/2022/7125242
work_keys_str_mv	AT yedi textfeatureextractionforpublicenglishvocabularybasedonwavelettransform AT shixiaojing textfeatureextractionforpublicenglishvocabularybasedonwavelettransform

Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform

Ejemplares similares