Cargando…

Evaluation of Federated Learning in Phishing Email Detection

The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensit...

Descripción completa

Detalles Bibliográficos
Autores principales: Thapa, Chandra, Tang, Jun Wen, Abuadbba, Alsharif, Gao, Yansong, Camtepe, Seyit, Nepal, Surya, Almashor, Mahathir, Zheng, Yifeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181597/
https://www.ncbi.nlm.nih.gov/pubmed/37177549
http://dx.doi.org/10.3390/s23094346
_version_ 1785041612263391232
author Thapa, Chandra
Tang, Jun Wen
Abuadbba, Alsharif
Gao, Yansong
Camtepe, Seyit
Nepal, Surya
Almashor, Mahathir
Zheng, Yifeng
author_facet Thapa, Chandra
Tang, Jun Wen
Abuadbba, Alsharif
Gao, Yansong
Camtepe, Seyit
Nepal, Surya
Almashor, Mahathir
Zheng, Yifeng
author_sort Thapa, Chandra
collection PubMed
description The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensitive information. Consequently, it has been difficult to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly federated learning (FL), is a desideratum. As it is already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein was the first to investigate the use of FL in phishing email detection. This study focused on building upon a deep neural network model, particularly recurrent convolutional neural network (RNN) and bidirectional encoder representations from transformers (BERT), for phishing email detection. We analyzed the FL-entangled learning performance in various settings, including (i) a balanced and asymmetrical data distribution among organizations and (ii) scalability. Our results corroborated the comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets and low organizational counts. Moreover, we observed a variation in performance when increasing the organizational counts. For a fixed total email dataset, the global RNN-based model had a 1.8% accuracy decrease when the organizational counts were increased from 2 to 10. In contrast, BERT accuracy increased by 0.6% when increasing organizational counts from 2 to 5. However, if we increased the overall email dataset by introducing new organizations in the FL framework, the organizational level performance improved by achieving a faster convergence speed. In addition, FL suffered in its overall global model performance due to highly unstable outputs if the email dataset distribution was highly asymmetric.
format Online
Article
Text
id pubmed-10181597
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101815972023-05-13 Evaluation of Federated Learning in Phishing Email Detection Thapa, Chandra Tang, Jun Wen Abuadbba, Alsharif Gao, Yansong Camtepe, Seyit Nepal, Surya Almashor, Mahathir Zheng, Yifeng Sensors (Basel) Article The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensitive information. Consequently, it has been difficult to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly federated learning (FL), is a desideratum. As it is already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein was the first to investigate the use of FL in phishing email detection. This study focused on building upon a deep neural network model, particularly recurrent convolutional neural network (RNN) and bidirectional encoder representations from transformers (BERT), for phishing email detection. We analyzed the FL-entangled learning performance in various settings, including (i) a balanced and asymmetrical data distribution among organizations and (ii) scalability. Our results corroborated the comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets and low organizational counts. Moreover, we observed a variation in performance when increasing the organizational counts. For a fixed total email dataset, the global RNN-based model had a 1.8% accuracy decrease when the organizational counts were increased from 2 to 10. In contrast, BERT accuracy increased by 0.6% when increasing organizational counts from 2 to 5. However, if we increased the overall email dataset by introducing new organizations in the FL framework, the organizational level performance improved by achieving a faster convergence speed. In addition, FL suffered in its overall global model performance due to highly unstable outputs if the email dataset distribution was highly asymmetric. MDPI 2023-04-27 /pmc/articles/PMC10181597/ /pubmed/37177549 http://dx.doi.org/10.3390/s23094346 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Thapa, Chandra
Tang, Jun Wen
Abuadbba, Alsharif
Gao, Yansong
Camtepe, Seyit
Nepal, Surya
Almashor, Mahathir
Zheng, Yifeng
Evaluation of Federated Learning in Phishing Email Detection
title Evaluation of Federated Learning in Phishing Email Detection
title_full Evaluation of Federated Learning in Phishing Email Detection
title_fullStr Evaluation of Federated Learning in Phishing Email Detection
title_full_unstemmed Evaluation of Federated Learning in Phishing Email Detection
title_short Evaluation of Federated Learning in Phishing Email Detection
title_sort evaluation of federated learning in phishing email detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181597/
https://www.ncbi.nlm.nih.gov/pubmed/37177549
http://dx.doi.org/10.3390/s23094346
work_keys_str_mv AT thapachandra evaluationoffederatedlearninginphishingemaildetection
AT tangjunwen evaluationoffederatedlearninginphishingemaildetection
AT abuadbbaalsharif evaluationoffederatedlearninginphishingemaildetection
AT gaoyansong evaluationoffederatedlearninginphishingemaildetection
AT camtepeseyit evaluationoffederatedlearninginphishingemaildetection
AT nepalsurya evaluationoffederatedlearninginphishingemaildetection
AT almashormahathir evaluationoffederatedlearninginphishingemaildetection
AT zhengyifeng evaluationoffederatedlearninginphishingemaildetection