Cargando…

Adaptive Naive Bayesian Anti-Spam Engine

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP...

Descripción completa

Detalles Bibliográficos
Autor principal: Gajewski, W P
Lenguaje:eng
Publicado: 2006
Materias:
Acceso en línea:http://cds.cern.ch/record/1020537
_version_ 1780912113538039808
author Gajewski, W P
author_facet Gajewski, W P
author_sort Gajewski, W P
collection CERN
description The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained. A reevaluation of algorithm's implementation and performance is effectuated from the perspective of over a year.
id cern-1020537
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2006
record_format invenio
spelling cern-10205372019-09-30T06:29:59Zhttp://cds.cern.ch/record/1020537engGajewski, W PAdaptive Naive Bayesian Anti-Spam EngineComputing and ComputersThe problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained. A reevaluation of algorithm's implementation and performance is effectuated from the perspective of over a year.CERN-OPEN-2007-011oai:cds.cern.ch:10205372006-05-17
spellingShingle Computing and Computers
Gajewski, W P
Adaptive Naive Bayesian Anti-Spam Engine
title Adaptive Naive Bayesian Anti-Spam Engine
title_full Adaptive Naive Bayesian Anti-Spam Engine
title_fullStr Adaptive Naive Bayesian Anti-Spam Engine
title_full_unstemmed Adaptive Naive Bayesian Anti-Spam Engine
title_short Adaptive Naive Bayesian Anti-Spam Engine
title_sort adaptive naive bayesian anti-spam engine
topic Computing and Computers
url http://cds.cern.ch/record/1020537
work_keys_str_mv AT gajewskiwp adaptivenaivebayesianantispamengine