Cargando…

UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection

In computer security, botnets still represent a significant cyber threat. Concealing techniques such as the dynamic addressing and the domain generation algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 m...

Descripción completa

Detalles Bibliográficos
Autores principales: Zago, Mattia, Gil Pérez, Manuel, Martínez Pérez, Gregorio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7090278/
https://www.ncbi.nlm.nih.gov/pubmed/32215308
http://dx.doi.org/10.1016/j.dib.2020.105400
_version_ 1783509899067523072
author Zago, Mattia
Gil Pérez, Manuel
Martínez Pérez, Gregorio
author_facet Zago, Mattia
Gil Pérez, Manuel
Martínez Pérez, Gregorio
author_sort Zago, Mattia
collection PubMed
description In computer security, botnets still represent a significant cyber threat. Concealing techniques such as the dynamic addressing and the domain generation algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labeled algorithmically generated domain names decorated with a feature set ready-to-use for machine learning (ML) analysis. This proposed dataset has been co-submitted with the research article ”UMUDGA: a dataset for profiling DGA-based botnet” [1], and it aims to enable researchers to move forward the data collection, organization, and pre-processing phases, eventually enabling them to focus on the analysis and the production of ML-powered solutions for network intrusion detection. In this research, we selected 50 among the most notorious malware variants to be as exhaustive as possible. Inhere, each family is available both as a list of domains (generated by executing the malware DGAs in a controlled environment with fixed parameters) and as a collection of features (generated by extracting a combination of statistical and natural language processing metrics).
format Online
Article
Text
id pubmed-7090278
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-70902782020-03-25 UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection Zago, Mattia Gil Pérez, Manuel Martínez Pérez, Gregorio Data Brief Computer Science In computer security, botnets still represent a significant cyber threat. Concealing techniques such as the dynamic addressing and the domain generation algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labeled algorithmically generated domain names decorated with a feature set ready-to-use for machine learning (ML) analysis. This proposed dataset has been co-submitted with the research article ”UMUDGA: a dataset for profiling DGA-based botnet” [1], and it aims to enable researchers to move forward the data collection, organization, and pre-processing phases, eventually enabling them to focus on the analysis and the production of ML-powered solutions for network intrusion detection. In this research, we selected 50 among the most notorious malware variants to be as exhaustive as possible. Inhere, each family is available both as a list of domains (generated by executing the malware DGAs in a controlled environment with fixed parameters) and as a collection of features (generated by extracting a combination of statistical and natural language processing metrics). Elsevier 2020-03-09 /pmc/articles/PMC7090278/ /pubmed/32215308 http://dx.doi.org/10.1016/j.dib.2020.105400 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Computer Science
Zago, Mattia
Gil Pérez, Manuel
Martínez Pérez, Gregorio
UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection
title UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection
title_full UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection
title_fullStr UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection
title_full_unstemmed UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection
title_short UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection
title_sort umudga: a dataset for profiling algorithmically generated domain names in botnet detection
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7090278/
https://www.ncbi.nlm.nih.gov/pubmed/32215308
http://dx.doi.org/10.1016/j.dib.2020.105400
work_keys_str_mv AT zagomattia umudgaadatasetforprofilingalgorithmicallygenerateddomainnamesinbotnetdetection
AT gilperezmanuel umudgaadatasetforprofilingalgorithmicallygenerateddomainnamesinbotnetdetection
AT martinezperezgregorio umudgaadatasetforprofilingalgorithmicallygenerateddomainnamesinbotnetdetection