Cargando…

Adaptive Naive Bayesian Anti-Spam Engine

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP...

Descripción completa

Detalles Bibliográficos
Autor principal:	Gajewski, W P
Lenguaje:	eng
Publicado:	2006
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1020537

_version_	1780912113538039808
author	Gajewski, W P
author_facet	Gajewski, W P
author_sort	Gajewski, W P
collection	CERN
description	The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained. A reevaluation of algorithm's implementation and performance is effectuated from the perspective of over a year.
id	cern-1020537
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2006
record_format	invenio
spelling	cern-10205372019-09-30T06:29:59Zhttp://cds.cern.ch/record/1020537engGajewski, W PAdaptive Naive Bayesian Anti-Spam EngineComputing and ComputersThe problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained. A reevaluation of algorithm's implementation and performance is effectuated from the perspective of over a year.CERN-OPEN-2007-011oai:cds.cern.ch:10205372006-05-17
spellingShingle	Computing and Computers Gajewski, W P Adaptive Naive Bayesian Anti-Spam Engine
title	Adaptive Naive Bayesian Anti-Spam Engine
title_full	Adaptive Naive Bayesian Anti-Spam Engine
title_fullStr	Adaptive Naive Bayesian Anti-Spam Engine
title_full_unstemmed	Adaptive Naive Bayesian Anti-Spam Engine
title_short	Adaptive Naive Bayesian Anti-Spam Engine
title_sort	adaptive naive bayesian anti-spam engine
topic	Computing and Computers
url	http://cds.cern.ch/record/1020537
work_keys_str_mv	AT gajewskiwp adaptivenaivebayesianantispamengine

Adaptive Naive Bayesian Anti-Spam Engine

Ejemplares similares