Cargando…

Bias and comparison framework for abusive language datasets

Recently, numerous datasets have been produced as research activities in the field of automatic detection of abusive language or hate speech have increased. A problem with this diversity is that they often differ, among other things, in context, platform, sampling process, collection strategy, and l...

Descripción completa

Detalles Bibliográficos
Autores principales: Wich, Maximilian, Eder, Tobias, Al Kuwatly, Hala, Groh, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8288848/
https://www.ncbi.nlm.nih.gov/pubmed/34790954
http://dx.doi.org/10.1007/s43681-021-00081-0
Descripción
Sumario:Recently, numerous datasets have been produced as research activities in the field of automatic detection of abusive language or hate speech have increased. A problem with this diversity is that they often differ, among other things, in context, platform, sampling process, collection strategy, and labeling schema. There have been surveys on these datasets, but they compare the datasets only superficially. Therefore, we developed a bias and comparison framework for abusive language datasets for their in-depth analysis and to provide a comparison of five English and six Arabic datasets. We make this framework available to researchers and data scientists who work with such datasets to be aware of the properties of the datasets and consider them in their work.