Cargando…
Addressing religious hate online: from taxonomy creation to automated detection
Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on mi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280248/ https://www.ncbi.nlm.nih.gov/pubmed/37346317 http://dx.doi.org/10.7717/peerj-cs.1128 |
_version_ | 1785060756692140032 |
---|---|
author | Ramponi, Alan Testa, Benedetta Tonelli, Sara Jezek, Elisabetta |
author_facet | Ramponi, Alan Testa, Benedetta Tonelli, Sara Jezek, Elisabetta |
author_sort | Ramponi, Alan |
collection | PubMed |
description | Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech. |
format | Online Article Text |
id | pubmed-10280248 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102802482023-06-21 Addressing religious hate online: from taxonomy creation to automated detection Ramponi, Alan Testa, Benedetta Tonelli, Sara Jezek, Elisabetta PeerJ Comput Sci Artificial Intelligence Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages—English and Italian—that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech. PeerJ Inc. 2022-12-15 /pmc/articles/PMC10280248/ /pubmed/37346317 http://dx.doi.org/10.7717/peerj-cs.1128 Text en ©2022 Ramponi et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Artificial Intelligence Ramponi, Alan Testa, Benedetta Tonelli, Sara Jezek, Elisabetta Addressing religious hate online: from taxonomy creation to automated detection |
title | Addressing religious hate online: from taxonomy creation to automated detection |
title_full | Addressing religious hate online: from taxonomy creation to automated detection |
title_fullStr | Addressing religious hate online: from taxonomy creation to automated detection |
title_full_unstemmed | Addressing religious hate online: from taxonomy creation to automated detection |
title_short | Addressing religious hate online: from taxonomy creation to automated detection |
title_sort | addressing religious hate online: from taxonomy creation to automated detection |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280248/ https://www.ncbi.nlm.nih.gov/pubmed/37346317 http://dx.doi.org/10.7717/peerj-cs.1128 |
work_keys_str_mv | AT ramponialan addressingreligioushateonlinefromtaxonomycreationtoautomateddetection AT testabenedetta addressingreligioushateonlinefromtaxonomycreationtoautomateddetection AT tonellisara addressingreligioushateonlinefromtaxonomycreationtoautomateddetection AT jezekelisabetta addressingreligioushateonlinefromtaxonomycreationtoautomateddetection |