Cargando…
Abusive language detection in youtube comments leveraging replies as conversational context
Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8507480/ https://www.ncbi.nlm.nih.gov/pubmed/34712802 http://dx.doi.org/10.7717/peerj-cs.742 |
_version_ | 1784581865665986560 |
---|---|
author | Ashraf, Noman Zubiaga, Arkaitz Gelbukh, Alexander |
author_facet | Ashraf, Noman Zubiaga, Arkaitz Gelbukh, Alexander |
author_sort | Ashraf, Noman |
collection | PubMed |
description | Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment. |
format | Online Article Text |
id | pubmed-8507480 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-85074802021-10-27 Abusive language detection in youtube comments leveraging replies as conversational context Ashraf, Noman Zubiaga, Arkaitz Gelbukh, Alexander PeerJ Comput Sci Computational Linguistics Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment. PeerJ Inc. 2021-10-08 /pmc/articles/PMC8507480/ /pubmed/34712802 http://dx.doi.org/10.7717/peerj-cs.742 Text en © 2021 Ashraf et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Computational Linguistics Ashraf, Noman Zubiaga, Arkaitz Gelbukh, Alexander Abusive language detection in youtube comments leveraging replies as conversational context |
title | Abusive language detection in youtube comments leveraging replies as conversational context |
title_full | Abusive language detection in youtube comments leveraging replies as conversational context |
title_fullStr | Abusive language detection in youtube comments leveraging replies as conversational context |
title_full_unstemmed | Abusive language detection in youtube comments leveraging replies as conversational context |
title_short | Abusive language detection in youtube comments leveraging replies as conversational context |
title_sort | abusive language detection in youtube comments leveraging replies as conversational context |
topic | Computational Linguistics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8507480/ https://www.ncbi.nlm.nih.gov/pubmed/34712802 http://dx.doi.org/10.7717/peerj-cs.742 |
work_keys_str_mv | AT ashrafnoman abusivelanguagedetectioninyoutubecommentsleveragingrepliesasconversationalcontext AT zubiagaarkaitz abusivelanguagedetectioninyoutubecommentsleveragingrepliesasconversationalcontext AT gelbukhalexander abusivelanguagedetectioninyoutubecommentsleveragingrepliesasconversationalcontext |