Cargando…

An intelligent identification and classification system for malicious uniform resource locators (URLs)

Uniform Resource Locator (URL) is a unique identifier composed of protocol and domain name used to locate and retrieve a resource on the Internet. Like any Internet service, URLs (also called websites) are vulnerable to compromise by attackers to develop Malicious URLs that can exploit/devastate the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Abu Al-Haija, Qasem, Al-Fayoumi, Mustafa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117275/ https://www.ncbi.nlm.nih.gov/pubmed/37362563 http://dx.doi.org/10.1007/s00521-023-08592-z

_version_	1785028589626851328
author	Abu Al-Haija, Qasem Al-Fayoumi, Mustafa
author_facet	Abu Al-Haija, Qasem Al-Fayoumi, Mustafa
author_sort	Abu Al-Haija, Qasem
collection	PubMed
description	Uniform Resource Locator (URL) is a unique identifier composed of protocol and domain name used to locate and retrieve a resource on the Internet. Like any Internet service, URLs (also called websites) are vulnerable to compromise by attackers to develop Malicious URLs that can exploit/devastate the user’s information and resources. Malicious URLs are usually designed with the intention of promoting cyber-attacks such as spam, phishing, malware, and defacement. These websites usually require action on the user’s side and can reach users across emails, text messages, pop-ups, or devious advertisements. They have a potential impact that can reach, in some cases, to compromise the machine or network of the user, especially those arriving by email. Therefore, developing systems to detect malicious URLs is of great interest nowadays. This paper proposes a high-performance machine learning-based detection system to identify Malicious URLs. The proposed system provides two layers of detection. Firstly, we identify the URLs as either benign or malware using a binary classifier. Secondly, we classify the URL classes based on their feature into five classes: benign, spam, phishing, malware, and defacement. Specifically, we report on four ensemble learning approaches, viz. the ensemble of bagging trees (En_Bag) approach, the ensemble of k-nearest neighbor (En_kNN) approach, and the ensemble of boosted decision trees (En_Bos) approach, and the ensemble of subspace discriminator (En_Dsc) approach. The developed approaches have been evaluated on an inclusive and contemporary dataset for uniform resource locators (ISCX-URL2016). ISCX-URL2016 provides a lightweight dataset for detecting and categorizing malicious URLs according to their attack type and lexical analysis. Conventional machine learning evaluation measurements are used to evaluate the detection accuracy, precision, recall, F Score, and detection time. Our experiential assessment indicates that the ensemble of bagging trees (En_Bag) approach provides better performance rates than other ensemble methods. Alternatively, the ensemble of the k-nearest neighbor (En_kNN) approach provides the highest inference speed. We also contrast our En_Bag model with state-of-the-art solutions and show its superiority in binary classification and multi-classification with accuracy rates of 99.3% and 97.92%, respectively.
format	Online Article Text
id	pubmed-10117275
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-101172752023-04-25 An intelligent identification and classification system for malicious uniform resource locators (URLs) Abu Al-Haija, Qasem Al-Fayoumi, Mustafa Neural Comput Appl Original Article Uniform Resource Locator (URL) is a unique identifier composed of protocol and domain name used to locate and retrieve a resource on the Internet. Like any Internet service, URLs (also called websites) are vulnerable to compromise by attackers to develop Malicious URLs that can exploit/devastate the user’s information and resources. Malicious URLs are usually designed with the intention of promoting cyber-attacks such as spam, phishing, malware, and defacement. These websites usually require action on the user’s side and can reach users across emails, text messages, pop-ups, or devious advertisements. They have a potential impact that can reach, in some cases, to compromise the machine or network of the user, especially those arriving by email. Therefore, developing systems to detect malicious URLs is of great interest nowadays. This paper proposes a high-performance machine learning-based detection system to identify Malicious URLs. The proposed system provides two layers of detection. Firstly, we identify the URLs as either benign or malware using a binary classifier. Secondly, we classify the URL classes based on their feature into five classes: benign, spam, phishing, malware, and defacement. Specifically, we report on four ensemble learning approaches, viz. the ensemble of bagging trees (En_Bag) approach, the ensemble of k-nearest neighbor (En_kNN) approach, and the ensemble of boosted decision trees (En_Bos) approach, and the ensemble of subspace discriminator (En_Dsc) approach. The developed approaches have been evaluated on an inclusive and contemporary dataset for uniform resource locators (ISCX-URL2016). ISCX-URL2016 provides a lightweight dataset for detecting and categorizing malicious URLs according to their attack type and lexical analysis. Conventional machine learning evaluation measurements are used to evaluate the detection accuracy, precision, recall, F Score, and detection time. Our experiential assessment indicates that the ensemble of bagging trees (En_Bag) approach provides better performance rates than other ensemble methods. Alternatively, the ensemble of the k-nearest neighbor (En_kNN) approach provides the highest inference speed. We also contrast our En_Bag model with state-of-the-art solutions and show its superiority in binary classification and multi-classification with accuracy rates of 99.3% and 97.92%, respectively. Springer London 2023-04-20 /pmc/articles/PMC10117275/ /pubmed/37362563 http://dx.doi.org/10.1007/s00521-023-08592-z Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Original Article Abu Al-Haija, Qasem Al-Fayoumi, Mustafa An intelligent identification and classification system for malicious uniform resource locators (URLs)
title	An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_full	An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_fullStr	An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_full_unstemmed	An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_short	An intelligent identification and classification system for malicious uniform resource locators (URLs)
title_sort	intelligent identification and classification system for malicious uniform resource locators (urls)
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117275/ https://www.ncbi.nlm.nih.gov/pubmed/37362563 http://dx.doi.org/10.1007/s00521-023-08592-z
work_keys_str_mv	AT abualhaijaqasem anintelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls AT alfayoumimustafa anintelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls AT abualhaijaqasem intelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls AT alfayoumimustafa intelligentidentificationandclassificationsystemformaliciousuniformresourcelocatorsurls

An intelligent identification and classification system for malicious uniform resource locators (URLs)

Ejemplares similares