Cargando…

Abusive language detection in youtube comments leveraging replies as conversational context

Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with...

Descripción completa

Detalles Bibliográficos
Autores principales: Ashraf, Noman, Zubiaga, Arkaitz, Gelbukh, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8507480/
https://www.ncbi.nlm.nih.gov/pubmed/34712802
http://dx.doi.org/10.7717/peerj-cs.742
_version_ 1784581865665986560
author Ashraf, Noman
Zubiaga, Arkaitz
Gelbukh, Alexander
author_facet Ashraf, Noman
Zubiaga, Arkaitz
Gelbukh, Alexander
author_sort Ashraf, Noman
collection PubMed
description Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.
format Online
Article
Text
id pubmed-8507480
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-85074802021-10-27 Abusive language detection in youtube comments leveraging replies as conversational context Ashraf, Noman Zubiaga, Arkaitz Gelbukh, Alexander PeerJ Comput Sci Computational Linguistics Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment. PeerJ Inc. 2021-10-08 /pmc/articles/PMC8507480/ /pubmed/34712802 http://dx.doi.org/10.7717/peerj-cs.742 Text en © 2021 Ashraf et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Computational Linguistics
Ashraf, Noman
Zubiaga, Arkaitz
Gelbukh, Alexander
Abusive language detection in youtube comments leveraging replies as conversational context
title Abusive language detection in youtube comments leveraging replies as conversational context
title_full Abusive language detection in youtube comments leveraging replies as conversational context
title_fullStr Abusive language detection in youtube comments leveraging replies as conversational context
title_full_unstemmed Abusive language detection in youtube comments leveraging replies as conversational context
title_short Abusive language detection in youtube comments leveraging replies as conversational context
title_sort abusive language detection in youtube comments leveraging replies as conversational context
topic Computational Linguistics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8507480/
https://www.ncbi.nlm.nih.gov/pubmed/34712802
http://dx.doi.org/10.7717/peerj-cs.742
work_keys_str_mv AT ashrafnoman abusivelanguagedetectioninyoutubecommentsleveragingrepliesasconversationalcontext
AT zubiagaarkaitz abusivelanguagedetectioninyoutubecommentsleveragingrepliesasconversationalcontext
AT gelbukhalexander abusivelanguagedetectioninyoutubecommentsleveragingrepliesasconversationalcontext