Cargando…

Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications

Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate spe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bilal, Muhammad, Khan, Atif, Jan, Salman, Musa, Shahrulniza, Ali, Shaukat
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10143294/ https://www.ncbi.nlm.nih.gov/pubmed/37112249 http://dx.doi.org/10.3390/s23083909

_version_	1785033817407356928
author	Bilal, Muhammad Khan, Atif Jan, Salman Musa, Shahrulniza Ali, Shaukat
author_facet	Bilal, Muhammad Khan, Atif Jan, Salman Musa, Shahrulniza Ali, Shaukat
author_sort	Bilal, Muhammad
collection	PubMed
description	Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate speech may result in hate crimes, cyber violence, and substantial harm to cyberspace, physical security, and social safety. As a result, hate speech detection is a critical issue for both cyberspace and physical society, necessitating the development of a robust application capable of detecting and combating it in real-time. Hate speech detection is a context-dependent problem that requires context-aware mechanisms for resolution. In this study, we employed a transformer-based model for Roman Urdu hate speech classification due to its ability to capture the text context. In addition, we developed the first Roman Urdu pre-trained BERT model, which we named BERT-RU. For this purpose, we exploited the capabilities of BERT by training it from scratch on the largest Roman Urdu dataset consisting of 173,714 text messages. Traditional and deep learning models were used as baseline models, including LSTM, BiLSTM, BiLSTM + Attention Layer, and CNN. We also investigated the concept of transfer learning by using pre-trained BERT embeddings in conjunction with deep learning models. The performance of each model was evaluated in terms of accuracy, precision, recall, and F-measure. The generalization of each model was evaluated on a cross-domain dataset. The experimental results revealed that the transformer-based model, when directly applied to the classification task of the Roman Urdu hate speech, outperformed traditional machine learning, deep learning models, and pre-trained transformer-based models in terms of accuracy, precision, recall, and F-measure, with scores of 96.70%, 97.25%, 96.74%, and 97.89%, respectively. In addition, the transformer-based model exhibited superior generalization on a cross-domain dataset.
format	Online Article Text
id	pubmed-10143294
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101432942023-04-29 Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications Bilal, Muhammad Khan, Atif Jan, Salman Musa, Shahrulniza Ali, Shaukat Sensors (Basel) Article Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate speech may result in hate crimes, cyber violence, and substantial harm to cyberspace, physical security, and social safety. As a result, hate speech detection is a critical issue for both cyberspace and physical society, necessitating the development of a robust application capable of detecting and combating it in real-time. Hate speech detection is a context-dependent problem that requires context-aware mechanisms for resolution. In this study, we employed a transformer-based model for Roman Urdu hate speech classification due to its ability to capture the text context. In addition, we developed the first Roman Urdu pre-trained BERT model, which we named BERT-RU. For this purpose, we exploited the capabilities of BERT by training it from scratch on the largest Roman Urdu dataset consisting of 173,714 text messages. Traditional and deep learning models were used as baseline models, including LSTM, BiLSTM, BiLSTM + Attention Layer, and CNN. We also investigated the concept of transfer learning by using pre-trained BERT embeddings in conjunction with deep learning models. The performance of each model was evaluated in terms of accuracy, precision, recall, and F-measure. The generalization of each model was evaluated on a cross-domain dataset. The experimental results revealed that the transformer-based model, when directly applied to the classification task of the Roman Urdu hate speech, outperformed traditional machine learning, deep learning models, and pre-trained transformer-based models in terms of accuracy, precision, recall, and F-measure, with scores of 96.70%, 97.25%, 96.74%, and 97.89%, respectively. In addition, the transformer-based model exhibited superior generalization on a cross-domain dataset. MDPI 2023-04-12 /pmc/articles/PMC10143294/ /pubmed/37112249 http://dx.doi.org/10.3390/s23083909 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Bilal, Muhammad Khan, Atif Jan, Salman Musa, Shahrulniza Ali, Shaukat Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
title	Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
title_full	Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
title_fullStr	Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
title_full_unstemmed	Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
title_short	Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
title_sort	roman urdu hate speech detection using transformer-based model for cyber security applications
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10143294/ https://www.ncbi.nlm.nih.gov/pubmed/37112249 http://dx.doi.org/10.3390/s23083909
work_keys_str_mv	AT bilalmuhammad romanurduhatespeechdetectionusingtransformerbasedmodelforcybersecurityapplications AT khanatif romanurduhatespeechdetectionusingtransformerbasedmodelforcybersecurityapplications AT jansalman romanurduhatespeechdetectionusingtransformerbasedmodelforcybersecurityapplications AT musashahrulniza romanurduhatespeechdetectionusingtransformerbasedmodelforcybersecurityapplications AT alishaukat romanurduhatespeechdetectionusingtransformerbasedmodelforcybersecurityapplications

Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications

Ejemplares similares