Cargando…

Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform

Text interpretation of public English vocabulary is a critical task in the subject of natural language processing, which uses technology to allow humans and computers to communicate effectively using natural language. Text feature extraction is one of the most fundamental and crucial elements in all...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Di, Shi, Xiaojing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206561/
https://www.ncbi.nlm.nih.gov/pubmed/35726227
http://dx.doi.org/10.1155/2022/7125242
_version_ 1784729356812156928
author Ye, Di
Shi, Xiaojing
author_facet Ye, Di
Shi, Xiaojing
author_sort Ye, Di
collection PubMed
description Text interpretation of public English vocabulary is a critical task in the subject of natural language processing, which uses technology to allow humans and computers to communicate effectively using natural language. Text feature extraction is one of the most fundamental and crucial elements in allowing computers to effectively grasp and read text. This paper proposes a text feature extraction method based on wavelet analysis that performs fast discrete wavelet transform and inverse discrete wavelet transform on the feature vectors under the traditional TF-IDF vector space model to address the problem of low feature differentiation of high-dimensional data in text feature extraction. In particular, due to the design of the Mallat algorithm, there is frequency aliasing in the signal decomposition process. This phenomenon is a problem that cannot be ignored when using wavelet analysis for feature extraction. Therefore, this paper proposes an improved inverse discrete wavelet transform method, in which the signal is decomposed by Mallat algorithm to obtain wavelet coefficients at each scale and then reconstructed to the required wavelet space coefficients according to the reconstruction method, and the reconstructed coefficients are used to analyze the signal at that scale instead of the wavelet coefficients obtained at the corresponding scale. Experiments on the public English vocabulary dataset reveal that the wavelet transform-based strategy suggested in this research outperforms existing feature extraction methods while maintaining greater classification accuracy while reducing the dimensionality of the TF-IDF vector space model.
format Online
Article
Text
id pubmed-9206561
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-92065612022-06-19 Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform Ye, Di Shi, Xiaojing Comput Math Methods Med Research Article Text interpretation of public English vocabulary is a critical task in the subject of natural language processing, which uses technology to allow humans and computers to communicate effectively using natural language. Text feature extraction is one of the most fundamental and crucial elements in allowing computers to effectively grasp and read text. This paper proposes a text feature extraction method based on wavelet analysis that performs fast discrete wavelet transform and inverse discrete wavelet transform on the feature vectors under the traditional TF-IDF vector space model to address the problem of low feature differentiation of high-dimensional data in text feature extraction. In particular, due to the design of the Mallat algorithm, there is frequency aliasing in the signal decomposition process. This phenomenon is a problem that cannot be ignored when using wavelet analysis for feature extraction. Therefore, this paper proposes an improved inverse discrete wavelet transform method, in which the signal is decomposed by Mallat algorithm to obtain wavelet coefficients at each scale and then reconstructed to the required wavelet space coefficients according to the reconstruction method, and the reconstructed coefficients are used to analyze the signal at that scale instead of the wavelet coefficients obtained at the corresponding scale. Experiments on the public English vocabulary dataset reveal that the wavelet transform-based strategy suggested in this research outperforms existing feature extraction methods while maintaining greater classification accuracy while reducing the dimensionality of the TF-IDF vector space model. Hindawi 2022-06-11 /pmc/articles/PMC9206561/ /pubmed/35726227 http://dx.doi.org/10.1155/2022/7125242 Text en Copyright © 2022 Di Ye and Xiaojing Shi. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ye, Di
Shi, Xiaojing
Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_full Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_fullStr Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_full_unstemmed Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_short Text Feature Extraction for Public English Vocabulary Based on Wavelet Transform
title_sort text feature extraction for public english vocabulary based on wavelet transform
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206561/
https://www.ncbi.nlm.nih.gov/pubmed/35726227
http://dx.doi.org/10.1155/2022/7125242
work_keys_str_mv AT yedi textfeatureextractionforpublicenglishvocabularybasedonwavelettransform
AT shixiaojing textfeatureextractionforpublicenglishvocabularybasedonwavelettransform