Cargando…
Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evalu...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9381222/ https://www.ncbi.nlm.nih.gov/pubmed/35983156 http://dx.doi.org/10.1155/2022/2500772 |
_version_ | 1784769030573260800 |
---|---|
author | Rayan, Alanazi |
author_facet | Rayan, Alanazi |
author_sort | Rayan, Alanazi |
collection | PubMed |
description | e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, f-measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy. |
format | Online Article Text |
id | pubmed-9381222 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-93812222022-08-17 Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique Rayan, Alanazi Comput Intell Neurosci Research Article e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, f-measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy. Hindawi 2022-08-09 /pmc/articles/PMC9381222/ /pubmed/35983156 http://dx.doi.org/10.1155/2022/2500772 Text en Copyright © 2022 Alanazi Rayan. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Rayan, Alanazi Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique |
title | Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique |
title_full | Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique |
title_fullStr | Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique |
title_full_unstemmed | Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique |
title_short | Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique |
title_sort | analysis of e-mail spam detection using a novel machine learning-based hybrid bagging technique |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9381222/ https://www.ncbi.nlm.nih.gov/pubmed/35983156 http://dx.doi.org/10.1155/2022/2500772 |
work_keys_str_mv | AT rayanalanazi analysisofemailspamdetectionusinganovelmachinelearningbasedhybridbaggingtechnique |