Cargando…

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts

Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xia, Tian, Chen, Xuemin, Wang, Jiacun, Qiu, Feng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10649562/ https://www.ncbi.nlm.nih.gov/pubmed/37960672 http://dx.doi.org/10.3390/s23218975

_version_	1785135580773875712
author	Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng
author_facet	Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng
author_sort	Xia, Tian
collection	PubMed
description	Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or unwanted messages. Short texts are difficult to classify because of their shortness, sparsity, rapidness, and informal writing. The effectiveness of the hidden Markov model (HMM) for short text classification has been illustrated in our previous study. However, the HMM has limited capability to handle new words, which are mostly generated by informal writing. In this paper, a hybrid model is proposed to address the informal writing issue by weighting new words for fast short text filtering with high accuracy. The hybrid model consists of an artificial neural network (ANN) and an HMM, which are used for new word weighting and spam filtering, respectively. The weight of a new word is calculated based on the weights of its neighbor, along with the spam and ham (i.e., not spam) probabilities of short text message predicted by the ANN. Performance evaluations on benchmark datasets, including the SMS message data maintained by University of California, Irvine; the movie reviews, and the customer reviews are conducted. The hybrid model operates at a significantly higher speed than deep learning models. The experiment results show that the proposed hybrid model outperforms other prominent machine learning algorithms, achieving a good balance between filtering throughput and accuracy.
format	Online Article Text
id	pubmed-10649562
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106495622023-11-04 A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng Sensors (Basel) Article Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or unwanted messages. Short texts are difficult to classify because of their shortness, sparsity, rapidness, and informal writing. The effectiveness of the hidden Markov model (HMM) for short text classification has been illustrated in our previous study. However, the HMM has limited capability to handle new words, which are mostly generated by informal writing. In this paper, a hybrid model is proposed to address the informal writing issue by weighting new words for fast short text filtering with high accuracy. The hybrid model consists of an artificial neural network (ANN) and an HMM, which are used for new word weighting and spam filtering, respectively. The weight of a new word is calculated based on the weights of its neighbor, along with the spam and ham (i.e., not spam) probabilities of short text message predicted by the ANN. Performance evaluations on benchmark datasets, including the SMS message data maintained by University of California, Irvine; the movie reviews, and the customer reviews are conducted. The hybrid model operates at a significantly higher speed than deep learning models. The experiment results show that the proposed hybrid model outperforms other prominent machine learning algorithms, achieving a good balance between filtering throughput and accuracy. MDPI 2023-11-04 /pmc/articles/PMC10649562/ /pubmed/37960672 http://dx.doi.org/10.3390/s23218975 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
title	A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
title_full	A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
title_fullStr	A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
title_full_unstemmed	A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
title_short	A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
title_sort	hybrid model with new word weighting for fast filtering spam short texts
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10649562/ https://www.ncbi.nlm.nih.gov/pubmed/37960672 http://dx.doi.org/10.3390/s23218975
work_keys_str_mv	AT xiatian ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT chenxuemin ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT wangjiacun ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT qiufeng ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT xiatian hybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT chenxuemin hybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT wangjiacun hybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT qiufeng hybridmodelwithnewwordweightingforfastfilteringspamshorttexts

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts

Ejemplares similares