Cargando…

Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling

Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting alg...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhanghui, Zhang, Yudong, Chen, Yuzhong, Fan, Xinwen, Dong, Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597131/
https://www.ncbi.nlm.nih.gov/pubmed/33286827
http://dx.doi.org/10.3390/e22091058
_version_ 1783602270571593728
author Liu, Zhanghui
Zhang, Yudong
Chen, Yuzhong
Fan, Xinwen
Dong, Chen
author_facet Liu, Zhanghui
Zhang, Yudong
Chen, Yuzhong
Fan, Xinwen
Dong, Chen
author_sort Liu, Zhanghui
collection PubMed
description Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness.
format Online
Article
Text
id pubmed-7597131
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75971312020-11-09 Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling Liu, Zhanghui Zhang, Yudong Chen, Yuzhong Fan, Xinwen Dong, Chen Entropy (Basel) Article Domain generation algorithms (DGAs) use specific parameters as random seeds to generate a large number of random domain names to prevent malicious domain name detection. This greatly increases the difficulty of detecting and defending against botnets and malware. Traditional models for detecting algorithmically generated domain names generally rely on manually extracting statistical characteristics from the domain names or network traffic and then employing classifiers to distinguish the algorithmically generated domain names. These models always require labor intensive manual feature engineering. In contrast, most state-of-the-art models based on deep neural networks are sensitive to imbalance in the sample distribution and cannot fully exploit the discriminative class features in domain names or network traffic, leading to decreased detection accuracy. To address these issues, we employ the borderline synthetic minority over-sampling algorithm (SMOTE) to improve sample balance. We also propose a recurrent convolutional neural network with spatial pyramid pooling (RCNN-SPP) to extract discriminative and distinctive class features. The recurrent convolutional neural network combines a convolutional neural network (CNN) and a bi-directional long short-term memory network (Bi-LSTM) to extract both the semantic and contextual information from domain names. We then employ the spatial pyramid pooling strategy to refine the contextual representation by capturing multi-scale contextual information from domain names. The experimental results from different domain name datasets demonstrate that our model can achieve 92.36% accuracy, an 89.55% recall rate, a 90.46% F1-score, and 95.39% AUC in identifying DGA and legitimate domain names, and it can achieve 92.45% accuracy rate, a 90.12% recall rate, a 90.86% F1-score, and 96.59% AUC in multi-classification problems. It achieves significant improvement over existing models in terms of accuracy and robustness. MDPI 2020-09-22 /pmc/articles/PMC7597131/ /pubmed/33286827 http://dx.doi.org/10.3390/e22091058 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Zhanghui
Zhang, Yudong
Chen, Yuzhong
Fan, Xinwen
Dong, Chen
Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_full Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_fullStr Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_full_unstemmed Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_short Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling
title_sort detection of algorithmically generated domain names using the recurrent convolutional neural network with spatial pyramid pooling
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597131/
https://www.ncbi.nlm.nih.gov/pubmed/33286827
http://dx.doi.org/10.3390/e22091058
work_keys_str_mv AT liuzhanghui detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT zhangyudong detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT chenyuzhong detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT fanxinwen detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling
AT dongchen detectionofalgorithmicallygenerateddomainnamesusingtherecurrentconvolutionalneuralnetworkwithspatialpyramidpooling