Cargando…

Analyzing and learning the language for different types of harassment

THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different ty...

Descripción completa

Detalles Bibliográficos
Autores principales: Rezvan, Mohammadreza, Shekarpour, Saeedeh, Alshargi, Faisal, Thirunarayan, Krishnaprasad, Shalin, Valerie L., Sheth, Amit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7100939/
https://www.ncbi.nlm.nih.gov/pubmed/32218569
http://dx.doi.org/10.1371/journal.pone.0227330
_version_ 1783511519450890240
author Rezvan, Mohammadreza
Shekarpour, Saeedeh
Alshargi, Faisal
Thirunarayan, Krishnaprasad
Shalin, Valerie L.
Sheth, Amit
author_facet Rezvan, Mohammadreza
Shekarpour, Saeedeh
Alshargi, Faisal
Thirunarayan, Krishnaprasad
Shalin, Valerie L.
Sheth, Amit
author_sort Rezvan, Mohammadreza
collection PubMed
description THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features.
format Online
Article
Text
id pubmed-7100939
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71009392020-04-03 Analyzing and learning the language for different types of harassment Rezvan, Mohammadreza Shekarpour, Saeedeh Alshargi, Faisal Thirunarayan, Krishnaprasad Shalin, Valerie L. Sheth, Amit PLoS One Research Article THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features. Public Library of Science 2020-03-27 /pmc/articles/PMC7100939/ /pubmed/32218569 http://dx.doi.org/10.1371/journal.pone.0227330 Text en © 2020 Rezvan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rezvan, Mohammadreza
Shekarpour, Saeedeh
Alshargi, Faisal
Thirunarayan, Krishnaprasad
Shalin, Valerie L.
Sheth, Amit
Analyzing and learning the language for different types of harassment
title Analyzing and learning the language for different types of harassment
title_full Analyzing and learning the language for different types of harassment
title_fullStr Analyzing and learning the language for different types of harassment
title_full_unstemmed Analyzing and learning the language for different types of harassment
title_short Analyzing and learning the language for different types of harassment
title_sort analyzing and learning the language for different types of harassment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7100939/
https://www.ncbi.nlm.nih.gov/pubmed/32218569
http://dx.doi.org/10.1371/journal.pone.0227330
work_keys_str_mv AT rezvanmohammadreza analyzingandlearningthelanguagefordifferenttypesofharassment
AT shekarpoursaeedeh analyzingandlearningthelanguagefordifferenttypesofharassment
AT alshargifaisal analyzingandlearningthelanguagefordifferenttypesofharassment
AT thirunarayankrishnaprasad analyzingandlearningthelanguagefordifferenttypesofharassment
AT shalinvaleriel analyzingandlearningthelanguagefordifferenttypesofharassment
AT shethamit analyzingandlearningthelanguagefordifferenttypesofharassment