Cargando…

Malicious and Benign Webpages Dataset

Web Security is a challenging task amidst ever rising threats on the Internet. With billions of websites active on Internet, and hackers evolving newer techniques to trap web users, machine learning offers promising techniques to detect malicious websites. The dataset described in this manuscript is...

Descripción completa

Detalles Bibliográficos
Autor principal:	Singh, A.K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2020
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648114/ https://www.ncbi.nlm.nih.gov/pubmed/33204771 http://dx.doi.org/10.1016/j.dib.2020.106304

_version_	1783607049579397120
author	Singh, A.K.
author_facet	Singh, A.K.
author_sort	Singh, A.K.
collection	PubMed
description	Web Security is a challenging task amidst ever rising threats on the Internet. With billions of websites active on Internet, and hackers evolving newer techniques to trap web users, machine learning offers promising techniques to detect malicious websites. The dataset described in this manuscript is meant for such machine learning based analysis of malicious and benign webpages. The data has been collected from Internet using a specialized focused web crawler named MalCrawler [1]. The dataset comprises of various extracted attributes, and also raw webpage content including JavaScript code. It supports both supervised and unsupervised learning. For supervised learning, class labels for malicious and benign webpages have been added to the dataset using the Google Safe Browsing API. The most relevant attributes within the scope have already been extracted and included in this dataset. However, the raw web content, including JavaScript code included in this dataset supports further attribute extraction, if so desired. Also, this raw content and code can be used as unstructured data input for text-based analytics. This dataset consists of data from approximately 1.5 million webpages, which makes it suitable for deep learning algorithms. This article also provides code snippets used for data extraction and its analysis.
format	Online Article Text
id	pubmed-7648114
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-76481142020-11-16 Malicious and Benign Webpages Dataset Singh, A.K. Data Brief Data Article Web Security is a challenging task amidst ever rising threats on the Internet. With billions of websites active on Internet, and hackers evolving newer techniques to trap web users, machine learning offers promising techniques to detect malicious websites. The dataset described in this manuscript is meant for such machine learning based analysis of malicious and benign webpages. The data has been collected from Internet using a specialized focused web crawler named MalCrawler [1]. The dataset comprises of various extracted attributes, and also raw webpage content including JavaScript code. It supports both supervised and unsupervised learning. For supervised learning, class labels for malicious and benign webpages have been added to the dataset using the Google Safe Browsing API. The most relevant attributes within the scope have already been extracted and included in this dataset. However, the raw web content, including JavaScript code included in this dataset supports further attribute extraction, if so desired. Also, this raw content and code can be used as unstructured data input for text-based analytics. This dataset consists of data from approximately 1.5 million webpages, which makes it suitable for deep learning algorithms. This article also provides code snippets used for data extraction and its analysis. Elsevier 2020-09-12 /pmc/articles/PMC7648114/ /pubmed/33204771 http://dx.doi.org/10.1016/j.dib.2020.106304 Text en © 2020 The Author http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Singh, A.K. Malicious and Benign Webpages Dataset
title	Malicious and Benign Webpages Dataset
title_full	Malicious and Benign Webpages Dataset
title_fullStr	Malicious and Benign Webpages Dataset
title_full_unstemmed	Malicious and Benign Webpages Dataset
title_short	Malicious and Benign Webpages Dataset
title_sort	malicious and benign webpages dataset
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648114/ https://www.ncbi.nlm.nih.gov/pubmed/33204771 http://dx.doi.org/10.1016/j.dib.2020.106304
work_keys_str_mv	AT singhak maliciousandbenignwebpagesdataset

Malicious and Benign Webpages Dataset

Ejemplares similares