Cargando…

Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Rundong, Zheng, Kangfeng, Wu, Bin, Wu, Chunhua, Wang, Xiujuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709380/ https://www.ncbi.nlm.nih.gov/pubmed/34960375 http://dx.doi.org/10.3390/s21248281

_version_	1784622921089548288
author	Yang, Rundong Zheng, Kangfeng Wu, Bin Wu, Chunhua Wang, Xiujuan
author_facet	Yang, Rundong Zheng, Kangfeng Wu, Bin Wu, Chunhua Wang, Xiujuan
author_sort	Yang, Rundong
collection	PubMed
description	Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model.
format	Online Article Text
id	pubmed-8709380
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87093802021-12-25 Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning Yang, Rundong Zheng, Kangfeng Wu, Bin Wu, Chunhua Wang, Xiujuan Sensors (Basel) Article Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model. MDPI 2021-12-10 /pmc/articles/PMC8709380/ /pubmed/34960375 http://dx.doi.org/10.3390/s21248281 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Yang, Rundong Zheng, Kangfeng Wu, Bin Wu, Chunhua Wang, Xiujuan Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title	Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_full	Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_fullStr	Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_full_unstemmed	Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_short	Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning
title_sort	phishing website detection based on deep convolutional neural network and random forest ensemble learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709380/ https://www.ncbi.nlm.nih.gov/pubmed/34960375 http://dx.doi.org/10.3390/s21248281
work_keys_str_mv	AT yangrundong phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning AT zhengkangfeng phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning AT wubin phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning AT wuchunhua phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning AT wangxiujuan phishingwebsitedetectionbasedondeepconvolutionalneuralnetworkandrandomforestensemblelearning

Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Ejemplares similares