Cargando…

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling

We propose a method for detecting anomalous domain names, with focus on algorithmically generated domain names which are frequently associated with malicious activities such as fast flux service networks, particularly for bot networks (or botnets), malware, and phishing. Our method is based on learn...

Descripción completa

Detalles Bibliográficos
Autores principales:	Raghuram, Jayaram, Miller, David J., Kesidis, George
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2014
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4294760/ https://www.ncbi.nlm.nih.gov/pubmed/25685511 http://dx.doi.org/10.1016/j.jare.2014.01.001

_version_	1782352764990390272
author	Raghuram, Jayaram Miller, David J. Kesidis, George
author_facet	Raghuram, Jayaram Miller, David J. Kesidis, George
author_sort	Raghuram, Jayaram
collection	PubMed
description	We propose a method for detecting anomalous domain names, with focus on algorithmically generated domain names which are frequently associated with malicious activities such as fast flux service networks, particularly for bot networks (or botnets), malware, and phishing. Our method is based on learning a (null hypothesis) probability model based on a large set of domain names that have been white listed by some reliable authority. Since these names are mostly assigned by humans, they are pronounceable, and tend to have a distribution of characters, words, word lengths, and number of words that are typical of some language (mostly English), and often consist of words drawn from a known lexicon. On the other hand, in the present day scenario, algorithmically generated domain names typically have distributions that are quite different from that of human-created domain names. We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated domain names. Unlike other methods, our approach can make detections without considering any additional (latency producing) information sources, often used to detect fast flux activity. Experiments on a publicly available, large data set of domain names associated with fast flux service networks show encouraging results, relative to several baseline methods, with higher detection rates and low false positive rates.
format	Online Article Text
id	pubmed-4294760
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-42947602015-02-14 Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling Raghuram, Jayaram Miller, David J. Kesidis, George J Adv Res Original Article We propose a method for detecting anomalous domain names, with focus on algorithmically generated domain names which are frequently associated with malicious activities such as fast flux service networks, particularly for bot networks (or botnets), malware, and phishing. Our method is based on learning a (null hypothesis) probability model based on a large set of domain names that have been white listed by some reliable authority. Since these names are mostly assigned by humans, they are pronounceable, and tend to have a distribution of characters, words, word lengths, and number of words that are typical of some language (mostly English), and often consist of words drawn from a known lexicon. On the other hand, in the present day scenario, algorithmically generated domain names typically have distributions that are quite different from that of human-created domain names. We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated domain names. Unlike other methods, our approach can make detections without considering any additional (latency producing) information sources, often used to detect fast flux activity. Experiments on a publicly available, large data set of domain names associated with fast flux service networks show encouraging results, relative to several baseline methods, with higher detection rates and low false positive rates. Elsevier 2014-07 2014-01-09 /pmc/articles/PMC4294760/ /pubmed/25685511 http://dx.doi.org/10.1016/j.jare.2014.01.001 Text en © 2014 Production and hosting by Elsevier B.V. http://creativecommons.org/licenses/by-nc-nd/3.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
spellingShingle	Original Article Raghuram, Jayaram Miller, David J. Kesidis, George Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
title	Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
title_full	Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
title_fullStr	Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
title_full_unstemmed	Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
title_short	Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
title_sort	unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4294760/ https://www.ncbi.nlm.nih.gov/pubmed/25685511 http://dx.doi.org/10.1016/j.jare.2014.01.001
work_keys_str_mv	AT raghuramjayaram unsupervisedlowlatencyanomalydetectionofalgorithmicallygenerateddomainnamesbygenerativeprobabilisticmodeling AT millerdavidj unsupervisedlowlatencyanomalydetectionofalgorithmicallygenerateddomainnamesbygenerativeprobabilisticmodeling AT kesidisgeorge unsupervisedlowlatencyanomalydetectionofalgorithmicallygenerateddomainnamesbygenerativeprobabilisticmodeling

Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling

Ejemplares similares