Cargando…

Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification

The increasing propagation of abusive language in social media is a major concern for supplier companies and governments because of its negative social impact. A large number of methods have been developed for its automatic identification, ranging from dictionary-based methods to sophisticated deep...

Descripción completa

Detalles Bibliográficos
Autores principales: Jarquín-Vásquez, Horacio Jesús, Montes-y-Gómez, Manuel, Villaseñor-Pineda, Luis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297587/
http://dx.doi.org/10.1007/978-3-030-49076-8_27
_version_ 1783547037774512128
author Jarquín-Vásquez, Horacio Jesús
Montes-y-Gómez, Manuel
Villaseñor-Pineda, Luis
author_facet Jarquín-Vásquez, Horacio Jesús
Montes-y-Gómez, Manuel
Villaseñor-Pineda, Luis
author_sort Jarquín-Vásquez, Horacio Jesús
collection PubMed
description The increasing propagation of abusive language in social media is a major concern for supplier companies and governments because of its negative social impact. A large number of methods have been developed for its automatic identification, ranging from dictionary-based methods to sophisticated deep learning approaches. A common problem in all these methods is to distinguish the offensive use of swear words from their everyday and humorous usage. To tackle this particular issue we propose an attention-based neural network architecture that captures the word n-grams importance according to their context. The obtained results in four standard collections from Twitter and Facebook are encouraging, they outperform the [Formula: see text] scores from state-of-the-art methods and allow identifying a set of inherently offensive swear words, and others in which its interpretation depends on its context.
format Online
Article
Text
id pubmed-7297587
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72975872020-06-17 Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification Jarquín-Vásquez, Horacio Jesús Montes-y-Gómez, Manuel Villaseñor-Pineda, Luis Pattern Recognition Article The increasing propagation of abusive language in social media is a major concern for supplier companies and governments because of its negative social impact. A large number of methods have been developed for its automatic identification, ranging from dictionary-based methods to sophisticated deep learning approaches. A common problem in all these methods is to distinguish the offensive use of swear words from their everyday and humorous usage. To tackle this particular issue we propose an attention-based neural network architecture that captures the word n-grams importance according to their context. The obtained results in four standard collections from Twitter and Facebook are encouraging, they outperform the [Formula: see text] scores from state-of-the-art methods and allow identifying a set of inherently offensive swear words, and others in which its interpretation depends on its context. 2020-04-29 /pmc/articles/PMC7297587/ http://dx.doi.org/10.1007/978-3-030-49076-8_27 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Jarquín-Vásquez, Horacio Jesús
Montes-y-Gómez, Manuel
Villaseñor-Pineda, Luis
Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification
title Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification
title_full Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification
title_fullStr Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification
title_full_unstemmed Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification
title_short Not All Swear Words Are Used Equal: Attention over Word n-grams for Abusive Language Identification
title_sort not all swear words are used equal: attention over word n-grams for abusive language identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297587/
http://dx.doi.org/10.1007/978-3-030-49076-8_27
work_keys_str_mv AT jarquinvasquezhoraciojesus notallswearwordsareusedequalattentionoverwordngramsforabusivelanguageidentification
AT montesygomezmanuel notallswearwordsareusedequalattentionoverwordngramsforabusivelanguageidentification
AT villasenorpinedaluis notallswearwordsareusedequalattentionoverwordngramsforabusivelanguageidentification