Cargando…

An intelligent identification and classification system for malicious uniform resource locators (URLs)

Uniform Resource Locator (URL) is a unique identifier composed of protocol and domain name used to locate and retrieve a resource on the Internet. Like any Internet service, URLs (also called websites) are vulnerable to compromise by attackers to develop Malicious URLs that can exploit/devastate the...

Descripción completa

Detalles Bibliográficos
Autores principales: Abu Al-Haija, Qasem, Al-Fayoumi, Mustafa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer London 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117275/
https://www.ncbi.nlm.nih.gov/pubmed/37362563
http://dx.doi.org/10.1007/s00521-023-08592-z
_version_ 1785028589626851328
author Abu Al-Haija, Qasem
Al-Fayoumi, Mustafa
author_facet Abu Al-Haija, Qasem
Al-Fayoumi, Mustafa
author_sort Abu Al-Haija, Qasem
collection PubMed
description Uniform Resource Locator (URL) is a unique identifier composed of protocol and domain name used to locate and retrieve a resource on the Internet. Like any Internet service, URLs (also called websites) are vulnerable to compromise by attackers to develop Malicious URLs that can exploit/devastate the user’s information and resources. Malicious URLs are usually designed with the intention of promoting cyber-attacks such as spam, phishing, malware, and defacement. These websites usually require action on the user’s side and can reach users across emails, text messages, pop-ups, or devious advertisements. They have a potential impact that can reach, in some cases, to compromise the machine or network of the user, especially those arriving by email. Therefore, developing systems to detect malicious URLs is of great interest nowadays. This paper proposes a high-performance machine learning-based detection system to identify Malicious URLs. The proposed system provides two layers of detection. Firstly, we identify the URLs as either benign or malware using a binary classifier. Secondly, we classify the URL classes based on their feature into five classes: benign, spam, phishing, malware, and defacement. Specifically, we report on four ensemble learning approaches, viz. the ensemble of bagging trees (En_Bag) approach, the ensemble of k-nearest neighbor (En_kNN) approach, and the ensemble of boosted decision trees (En_Bos) approach, and the ensemble of subspace discriminator (En_Dsc) approach. The developed approaches have been evaluated on an inclusive and contemporary dataset for uniform resource locators (ISCX-URL2016). ISCX-URL2016 provides a lightweight dataset for detecting and categorizing malicious URLs according to their attack type and lexical analysis. Conventional machine learning evaluation measurements are used to evaluate the detection accuracy, precision, recall, F Score, and detection time. Our experiential assessment indicates that the ensemble of bagging trees (En_Bag) approach provides better performance rates than other ensemble methods. Alternatively, the ensemble of the k-nearest neighbor (En_kNN) approach provides the highest inference speed. We also contrast our En_Bag model with state-of-the-art solutions and show its superiority in binary classification and multi-classification with accuracy rates of 99.3% and 97.92%, respectively.
format Online
Article
Text
id pubmed-10117275
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer London
record_format MEDLINE/PubMed
spelling pubmed-101172752023-04-25 An intelligent identification and classification system for malicious uniform resource locators (URLs) Abu Al-Haija, Qasem Al-Fayoumi, Mustafa Neural Comput Appl Original Article Uniform Resource Locator (URL) is a unique identifier composed of protocol and domain name used to locate and retrieve a resource on the Internet. Like any Internet service, URLs (also called websites) are vulnerable to compromise by attackers to develop Malicious URLs that can exploit/devastate the user’s information and resources. Malicious URLs are usually designed with the intention of promoting cyber-attacks such as spam, phishing, malware, and defacement. These websites usually require action on the user’s side and can reach users across emails, text messages, pop-ups, or devious advertisements. They have a potential impact that can reach, in some cases, to compromise the machine or network of the user, especially those arriving by email. Therefore, developing systems to detect malicious URLs is of great interest nowadays. This paper proposes a high-performance machine learning-based detection system to identify Malicious URLs. The proposed system provides two layers of detection. Firstly, we identify the URLs as either benign or malware using a binary classifier. Secondly, we classify the URL classes based on their feature into five classes: benign, spam, phishing, malware, and defacement. Specifically, we report on four ensemble learning approaches, viz. the ensemble of bagging trees (En_Bag) approach, the ensemble of k-nearest neighbor (En_kNN) approach, and the ensemble of boosted decision trees (En_Bos) approach, and the ensemble of subspace discriminator (En_Dsc) approach. The developed approaches have been evaluated on an inclusive and contemporary dataset for uniform resource locators (ISCX-URL2016). ISCX-URL2016 provides a lightweight dataset for detecting and categorizing malicious URLs according to their attack type and lexical analysis. Conventional machine learning evaluation measurements are used to evaluate the detection accuracy, precision, recall, F Score, and detection time. Our experiential assessment indicates that the ensemble of bagging trees (En_Bag) approach provides better performance rates than other ensemble methods. Alternatively, the ensemble of the k-nearest neighbor (En_kNN) approach provides the highest inference speed. We also contrast our En_Bag model with state-of-the-art solutions and show its superiority in binary classification and multi-classification with accuracy rates of 99.3% and 97.92%, respectively. Springer London 2023-04-20 /pmc/articles/PMC10117275/ /pubmed/37362563 http://dx.doi.org/10.1007/s00521-023-08592-z Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Article
Abu Al-Haija, Qasem
Al-Fayoumi, Mustafa
An intelligent identification and classification system for malicious uniform resource locators (URLs)
title An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_full An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_fullStr An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_full_unstemmed An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_short An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_sort intelligent identification and classification system for malicious uniform resource locators (urls)
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117275/
https://www.ncbi.nlm.nih.gov/pubmed/37362563
http://dx.doi.org/10.1007/s00521-023-08592-z
work_keys_str_mv AT abualhaijaqasem anintelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls
AT alfayoumimustafa anintelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls
AT abualhaijaqasem intelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls
AT alfayoumimustafa intelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls