Cargando…

An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models

The World Wide Web services are essential in our daily lives and are available to communities through Uniform Resource Locator (URL). Attackers utilize such means of communication and create malicious URLs to conduct fraudulent activities and deceive others by creating deceptive and misleading websi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aljabri, Malak, Alhaidari, Fahd, Mohammad, Rami Mustafa A., Samiha Mirza, Alhamed, Dina H., Altamimi, Hanan S., Chrouf, Sara Mhd. Bachar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436524/ https://www.ncbi.nlm.nih.gov/pubmed/36059391 http://dx.doi.org/10.1155/2022/3241216

_version_	1784781384455290880
author	Aljabri, Malak Alhaidari, Fahd Mohammad, Rami Mustafa A. Samiha Mirza, Alhamed, Dina H. Altamimi, Hanan S. Chrouf, Sara Mhd. Bachar
author_facet	Aljabri, Malak Alhaidari, Fahd Mohammad, Rami Mustafa A. Samiha Mirza, Alhamed, Dina H. Altamimi, Hanan S. Chrouf, Sara Mhd. Bachar
author_sort	Aljabri, Malak
collection	PubMed
description	The World Wide Web services are essential in our daily lives and are available to communities through Uniform Resource Locator (URL). Attackers utilize such means of communication and create malicious URLs to conduct fraudulent activities and deceive others by creating deceptive and misleading websites and domains. Such threats open the doors for many critical attacks such as spams, spyware, phishing, and malware. Therefore, detecting malicious URL is crucially important to prevent the occurrence of many cybercriminal activities. In this study, we examined a set of machine learning (ML) and deep learning (DL) models to detect malicious websites using a dataset comprising 66,506 records of URLs. We engineered three different types of features including lexical-based, network-based and content-based features. To extract the most discriminative features in the dataset, we applied several features selection algorithms, namely, correlation analysis, Analysis of Variance (ANOVA), and chi-square. Finally, we conducted a comparative performance evaluation for several ML and DL models considering set of criteria commonly used to evaluate such models. Results depicted that Naïve Bayes (NB) was the best model for detecting malicious URLs using the applied data with an accuracy of 96%. This research has made contribution to the field by conducting significant features engineering and analysis to identify the best features for malicious URLs predictions, compare different models and achieve a high accuracy using a large new URL dataset.
format	Online Article Text
id	pubmed-9436524
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-94365242022-09-02 An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models Aljabri, Malak Alhaidari, Fahd Mohammad, Rami Mustafa A. Samiha Mirza, Alhamed, Dina H. Altamimi, Hanan S. Chrouf, Sara Mhd. Bachar Comput Intell Neurosci Research Article The World Wide Web services are essential in our daily lives and are available to communities through Uniform Resource Locator (URL). Attackers utilize such means of communication and create malicious URLs to conduct fraudulent activities and deceive others by creating deceptive and misleading websites and domains. Such threats open the doors for many critical attacks such as spams, spyware, phishing, and malware. Therefore, detecting malicious URL is crucially important to prevent the occurrence of many cybercriminal activities. In this study, we examined a set of machine learning (ML) and deep learning (DL) models to detect malicious websites using a dataset comprising 66,506 records of URLs. We engineered three different types of features including lexical-based, network-based and content-based features. To extract the most discriminative features in the dataset, we applied several features selection algorithms, namely, correlation analysis, Analysis of Variance (ANOVA), and chi-square. Finally, we conducted a comparative performance evaluation for several ML and DL models considering set of criteria commonly used to evaluate such models. Results depicted that Naïve Bayes (NB) was the best model for detecting malicious URLs using the applied data with an accuracy of 96%. This research has made contribution to the field by conducting significant features engineering and analysis to identify the best features for malicious URLs predictions, compare different models and achieve a high accuracy using a large new URL dataset. Hindawi 2022-08-25 /pmc/articles/PMC9436524/ /pubmed/36059391 http://dx.doi.org/10.1155/2022/3241216 Text en Copyright © 2022 Malak Aljabri et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Aljabri, Malak Alhaidari, Fahd Mohammad, Rami Mustafa A. Samiha Mirza, Alhamed, Dina H. Altamimi, Hanan S. Chrouf, Sara Mhd. Bachar An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models
title	An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models
title_full	An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models
title_fullStr	An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models
title_full_unstemmed	An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models
title_short	An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models
title_sort	assessment of lexical, network, and content-based features for detecting malicious urls using machine learning and deep learning models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436524/ https://www.ncbi.nlm.nih.gov/pubmed/36059391 http://dx.doi.org/10.1155/2022/3241216
work_keys_str_mv	AT aljabrimalak anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT alhaidarifahd anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT mohammadramimustafaa anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT samihamirza anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT alhameddinah anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT altamimihanans anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT chroufsaramhdbachar anassessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT aljabrimalak assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT alhaidarifahd assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT mohammadramimustafaa assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT samihamirza assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT alhameddinah assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT altamimihanans assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels AT chroufsaramhdbachar assessmentoflexicalnetworkandcontentbasedfeaturesfordetectingmaliciousurlsusingmachinelearninganddeeplearningmodels

An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models

Ejemplares similares