Cargando…

Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Rundong, Zheng, Kangfeng, Wu, Bin, Wu, Chunhua, Wang, Xiujuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709380/
https://www.ncbi.nlm.nih.gov/pubmed/34960375
http://dx.doi.org/10.3390/s21248281
_version_ 1784622921089548288
author Yang, Rundong
Zheng, Kangfeng
Wu, Bin
Wu, Chunhua
Wang, Xiujuan
author_facet Yang, Rundong
Zheng, Kangfeng
Wu, Bin
Wu, Chunhua
Wang, Xiujuan
author_sort Yang, Rundong
collection PubMed
description Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model.
format Online
Article
Text
id pubmed-8709380
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87093802021-12-25 Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning Yang, Rundong Zheng, Kangfeng Wu, Bin Wu, Chunhua Wang, Xiujuan Sensors (Basel) Article Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model. MDPI 2021-12-10 /pmc/articles/PMC8709380/ /pubmed/34960375 http://dx.doi.org/10.3390/s21248281 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Rundong
Zheng, Kangfeng
Wu, Bin
Wu, Chunhua
Wang, Xiujuan
Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_full Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_fullStr Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_full_unstemmed Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_short Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_sort phishing website detection based on deep convolutional neural network and random forest ensemble learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709380/
https://www.ncbi.nlm.nih.gov/pubmed/34960375
http://dx.doi.org/10.3390/s21248281
work_keys_str_mv AT yangrundong phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning
AT zhengkangfeng phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning
AT wubin phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning
AT wuchunhua phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning
AT wangxiujuan phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning