Cargando…

Detecting and Monitoring Hate Speech in Twitter

Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific ind...

Descripción completa

Detalles Bibliográficos
Autores principales: Pereira-Kohatsu, Juan Carlos, Quijano-Sánchez, Lara, Liberatore, Federico, Camacho-Collados, Miguel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864473/
https://www.ncbi.nlm.nih.gov/pubmed/31717760
http://dx.doi.org/10.3390/s19214654
_version_ 1783471890929549312
author Pereira-Kohatsu, Juan Carlos
Quijano-Sánchez, Lara
Liberatore, Federico
Camacho-Collados, Miguel
author_facet Pereira-Kohatsu, Juan Carlos
Quijano-Sánchez, Lara
Liberatore, Federico
Camacho-Collados, Miguel
author_sort Pereira-Kohatsu, Juan Carlos
collection PubMed
description Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literature.
format Online
Article
Text
id pubmed-6864473
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-68644732019-12-23 Detecting and Monitoring Hate Speech in Twitter Pereira-Kohatsu, Juan Carlos Quijano-Sánchez, Lara Liberatore, Federico Camacho-Collados, Miguel Sensors (Basel) Article Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literature. MDPI 2019-10-26 /pmc/articles/PMC6864473/ /pubmed/31717760 http://dx.doi.org/10.3390/s19214654 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pereira-Kohatsu, Juan Carlos
Quijano-Sánchez, Lara
Liberatore, Federico
Camacho-Collados, Miguel
Detecting and Monitoring Hate Speech in Twitter
title Detecting and Monitoring Hate Speech in Twitter
title_full Detecting and Monitoring Hate Speech in Twitter
title_fullStr Detecting and Monitoring Hate Speech in Twitter
title_full_unstemmed Detecting and Monitoring Hate Speech in Twitter
title_short Detecting and Monitoring Hate Speech in Twitter
title_sort detecting and monitoring hate speech in twitter
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6864473/
https://www.ncbi.nlm.nih.gov/pubmed/31717760
http://dx.doi.org/10.3390/s19214654
work_keys_str_mv AT pereirakohatsujuancarlos detectingandmonitoringhatespeechintwitter
AT quijanosanchezlara detectingandmonitoringhatespeechintwitter
AT liberatorefederico detectingandmonitoringhatespeechintwitter
AT camachocolladosmiguel detectingandmonitoringhatespeechintwitter