Cargando…
A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts
Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10649562/ https://www.ncbi.nlm.nih.gov/pubmed/37960672 http://dx.doi.org/10.3390/s23218975 |
_version_ | 1785135580773875712 |
---|---|
author | Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng |
author_facet | Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng |
author_sort | Xia, Tian |
collection | PubMed |
description | Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or unwanted messages. Short texts are difficult to classify because of their shortness, sparsity, rapidness, and informal writing. The effectiveness of the hidden Markov model (HMM) for short text classification has been illustrated in our previous study. However, the HMM has limited capability to handle new words, which are mostly generated by informal writing. In this paper, a hybrid model is proposed to address the informal writing issue by weighting new words for fast short text filtering with high accuracy. The hybrid model consists of an artificial neural network (ANN) and an HMM, which are used for new word weighting and spam filtering, respectively. The weight of a new word is calculated based on the weights of its neighbor, along with the spam and ham (i.e., not spam) probabilities of short text message predicted by the ANN. Performance evaluations on benchmark datasets, including the SMS message data maintained by University of California, Irvine; the movie reviews, and the customer reviews are conducted. The hybrid model operates at a significantly higher speed than deep learning models. The experiment results show that the proposed hybrid model outperforms other prominent machine learning algorithms, achieving a good balance between filtering throughput and accuracy. |
format | Online Article Text |
id | pubmed-10649562 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-106495622023-11-04 A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng Sensors (Basel) Article Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or unwanted messages. Short texts are difficult to classify because of their shortness, sparsity, rapidness, and informal writing. The effectiveness of the hidden Markov model (HMM) for short text classification has been illustrated in our previous study. However, the HMM has limited capability to handle new words, which are mostly generated by informal writing. In this paper, a hybrid model is proposed to address the informal writing issue by weighting new words for fast short text filtering with high accuracy. The hybrid model consists of an artificial neural network (ANN) and an HMM, which are used for new word weighting and spam filtering, respectively. The weight of a new word is calculated based on the weights of its neighbor, along with the spam and ham (i.e., not spam) probabilities of short text message predicted by the ANN. Performance evaluations on benchmark datasets, including the SMS message data maintained by University of California, Irvine; the movie reviews, and the customer reviews are conducted. The hybrid model operates at a significantly higher speed than deep learning models. The experiment results show that the proposed hybrid model outperforms other prominent machine learning algorithms, achieving a good balance between filtering throughput and accuracy. MDPI 2023-11-04 /pmc/articles/PMC10649562/ /pubmed/37960672 http://dx.doi.org/10.3390/s23218975 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Xia, Tian Chen, Xuemin Wang, Jiacun Qiu, Feng A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts |
title | A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts |
title_full | A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts |
title_fullStr | A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts |
title_full_unstemmed | A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts |
title_short | A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts |
title_sort | hybrid model with new word weighting for fast filtering spam short texts |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10649562/ https://www.ncbi.nlm.nih.gov/pubmed/37960672 http://dx.doi.org/10.3390/s23218975 |
work_keys_str_mv | AT xiatian ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT chenxuemin ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT wangjiacun ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT qiufeng ahybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT xiatian hybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT chenxuemin hybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT wangjiacun hybridmodelwithnewwordweightingforfastfilteringspamshorttexts AT qiufeng hybridmodelwithnewwordweightingforfastfilteringspamshorttexts |