Cargando…

Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning

Web applications have become ubiquitous for many business sectors due to their platform independence and low operation cost. Billions of users are visiting these applications to accomplish their daily tasks. However, many of these applications are either vulnerable to web defacement attacks or creat...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghaleb, Fuad A., Alsaedi, Mohammed, Saeed, Faisal, Ahmad, Jawad, Alasli, Mohammed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9101641/
https://www.ncbi.nlm.nih.gov/pubmed/35591061
http://dx.doi.org/10.3390/s22093373
_version_ 1784707136266174464
author Ghaleb, Fuad A.
Alsaedi, Mohammed
Saeed, Faisal
Ahmad, Jawad
Alasli, Mohammed
author_facet Ghaleb, Fuad A.
Alsaedi, Mohammed
Saeed, Faisal
Ahmad, Jawad
Alasli, Mohammed
author_sort Ghaleb, Fuad A.
collection PubMed
description Web applications have become ubiquitous for many business sectors due to their platform independence and low operation cost. Billions of users are visiting these applications to accomplish their daily tasks. However, many of these applications are either vulnerable to web defacement attacks or created and managed by hackers such as fraudulent and phishing websites. Detecting malicious websites is essential to prevent the spreading of malware and protect end-users from being victims. However, most existing solutions rely on extracting features from the website’s content which can be harmful to the detection machines themselves and subject to obfuscations. Detecting malicious Uniform Resource Locators (URLs) is safer and more efficient than content analysis. However, the detection of malicious URLs is still not well addressed due to insufficient features and inaccurate classification. This study aims at improving the detection accuracy of malicious URL detection by designing and developing a cyber threat intelligence-based malicious URL detection model using two-stage ensemble learning. The cyber threat intelligence-based features are extracted from web searches to improve detection accuracy. Cybersecurity analysts and users reports around the globe can provide important information regarding malicious websites. Therefore, cyber threat intelligence-based (CTI) features extracted from Google searches and Whois websites are used to improve detection performance. The study also proposed a two-stage ensemble learning model that combines the random forest (RF) algorithm for preclassification with multilayer perceptron (MLP) for final decision making. The trained MLP classifier has replaced the majority voting scheme of the three trained random forest classifiers for decision making. The probabilistic output of the weak classifiers of the random forest was aggregated and used as input for the MLP classifier for adequate classification. Results show that the extracted CTI-based features with the two-stage classification outperform other studies’ detection models. The proposed CTI-based detection model achieved a 7.8% accuracy improvement and 6.7% reduction in false-positive rates compared with the traditional URL-based model.
format Online
Article
Text
id pubmed-9101641
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91016412022-05-14 Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning Ghaleb, Fuad A. Alsaedi, Mohammed Saeed, Faisal Ahmad, Jawad Alasli, Mohammed Sensors (Basel) Article Web applications have become ubiquitous for many business sectors due to their platform independence and low operation cost. Billions of users are visiting these applications to accomplish their daily tasks. However, many of these applications are either vulnerable to web defacement attacks or created and managed by hackers such as fraudulent and phishing websites. Detecting malicious websites is essential to prevent the spreading of malware and protect end-users from being victims. However, most existing solutions rely on extracting features from the website’s content which can be harmful to the detection machines themselves and subject to obfuscations. Detecting malicious Uniform Resource Locators (URLs) is safer and more efficient than content analysis. However, the detection of malicious URLs is still not well addressed due to insufficient features and inaccurate classification. This study aims at improving the detection accuracy of malicious URL detection by designing and developing a cyber threat intelligence-based malicious URL detection model using two-stage ensemble learning. The cyber threat intelligence-based features are extracted from web searches to improve detection accuracy. Cybersecurity analysts and users reports around the globe can provide important information regarding malicious websites. Therefore, cyber threat intelligence-based (CTI) features extracted from Google searches and Whois websites are used to improve detection performance. The study also proposed a two-stage ensemble learning model that combines the random forest (RF) algorithm for preclassification with multilayer perceptron (MLP) for final decision making. The trained MLP classifier has replaced the majority voting scheme of the three trained random forest classifiers for decision making. The probabilistic output of the weak classifiers of the random forest was aggregated and used as input for the MLP classifier for adequate classification. Results show that the extracted CTI-based features with the two-stage classification outperform other studies’ detection models. The proposed CTI-based detection model achieved a 7.8% accuracy improvement and 6.7% reduction in false-positive rates compared with the traditional URL-based model. MDPI 2022-04-28 /pmc/articles/PMC9101641/ /pubmed/35591061 http://dx.doi.org/10.3390/s22093373 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ghaleb, Fuad A.
Alsaedi, Mohammed
Saeed, Faisal
Ahmad, Jawad
Alasli, Mohammed
Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_full Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_fullStr Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_full_unstemmed Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_short Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning
title_sort cyber threat intelligence-based malicious url detection model using ensemble learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9101641/
https://www.ncbi.nlm.nih.gov/pubmed/35591061
http://dx.doi.org/10.3390/s22093373
work_keys_str_mv AT ghalebfuada cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning
AT alsaedimohammed cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning
AT saeedfaisal cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning
AT ahmadjawad cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning
AT alaslimohammed cyberthreatintelligencebasedmaliciousurldetectionmodelusingensemblelearning