Cargando…

Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique

e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evalu...

Descripción completa

Detalles Bibliográficos
Autor principal: Rayan, Alanazi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9381222/
https://www.ncbi.nlm.nih.gov/pubmed/35983156
http://dx.doi.org/10.1155/2022/2500772
_version_ 1784769030573260800
author Rayan, Alanazi
author_facet Rayan, Alanazi
author_sort Rayan, Alanazi
collection PubMed
description e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, f-measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy.
format Online
Article
Text
id pubmed-9381222
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-93812222022-08-17 Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique Rayan, Alanazi Comput Intell Neurosci Research Article e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, f-measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy. Hindawi 2022-08-09 /pmc/articles/PMC9381222/ /pubmed/35983156 http://dx.doi.org/10.1155/2022/2500772 Text en Copyright © 2022 Alanazi Rayan. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rayan, Alanazi
Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
title Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
title_full Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
title_fullStr Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
title_full_unstemmed Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
title_short Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique
title_sort analysis of e-mail spam detection using a novel machine learning-based hybrid bagging technique
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9381222/
https://www.ncbi.nlm.nih.gov/pubmed/35983156
http://dx.doi.org/10.1155/2022/2500772
work_keys_str_mv AT rayanalanazi analysisofemailspamdetectionusinganovelmachinelearningbasedhybridbaggingtechnique