Cargando…

Addressing religious hate online: from taxonomy creation to automated detection

Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on mi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramponi, Alan, Testa, Benedetta, Tonelli, Sara, Jezek, Elisabetta
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280248/
https://www.ncbi.nlm.nih.gov/pubmed/37346317
http://dx.doi.org/10.7717/peerj-cs.1128
_version_ 1785060756692140032
author Ramponi, Alan
Testa, Benedetta
Tonelli, Sara
Jezek, Elisabetta
author_facet Ramponi, Alan
Testa, Benedetta
Tonelli, Sara
Jezek, Elisabetta
author_sort Ramponi, Alan
collection PubMed
description Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech.
format Online
Article
Text
id pubmed-10280248
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-102802482023-06-21 Addressing religious hate online: from taxonomy creation to automated detection Ramponi, Alan Testa, Benedetta Tonelli, Sara Jezek, Elisabetta PeerJ Comput Sci Artificial Intelligence Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech. PeerJ Inc. 2022-12-15 /pmc/articles/PMC10280248/ /pubmed/37346317 http://dx.doi.org/10.7717/peerj-cs.1128 Text en ©2022 Ramponi et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Ramponi, Alan
Testa, Benedetta
Tonelli, Sara
Jezek, Elisabetta
Addressing religious hate online: from taxonomy creation to automated detection
title Addressing religious hate online: from taxonomy creation to automated detection
title_full Addressing religious hate online: from taxonomy creation to automated detection
title_fullStr Addressing religious hate online: from taxonomy creation to automated detection
title_full_unstemmed Addressing religious hate online: from taxonomy creation to automated detection
title_short Addressing religious hate online: from taxonomy creation to automated detection
title_sort addressing religious hate online: from taxonomy creation to automated detection
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280248/
https://www.ncbi.nlm.nih.gov/pubmed/37346317
http://dx.doi.org/10.7717/peerj-cs.1128
work_keys_str_mv AT ramponialan addressingreligioushateonlinefromtaxonomycreationtoautomateddetection
AT testabenedetta addressingreligioushateonlinefromtaxonomycreationtoautomateddetection
AT tonellisara addressingreligioushateonlinefromtaxonomycreationtoautomateddetection
AT jezekelisabetta addressingreligioushateonlinefromtaxonomycreationtoautomateddetection