Cargando…

Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content

In this paper we present a benchmark dataset generated as part of a project for automatic identification of misogyny within online content, which focuses in particular on memes. The benchmark here described is composed of 800 memes collected from the most popular social media platforms, such as Face...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gasparini, Francesca, Rizzi, Giulia, Saibene, Aurora, Fersini, Elisabetta
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9471366/ https://www.ncbi.nlm.nih.gov/pubmed/36117643 http://dx.doi.org/10.1016/j.dib.2022.108526

_version_	1784789057913487360
author	Gasparini, Francesca Rizzi, Giulia Saibene, Aurora Fersini, Elisabetta
author_facet	Gasparini, Francesca Rizzi, Giulia Saibene, Aurora Fersini, Elisabetta
author_sort	Gasparini, Francesca
collection	PubMed
description	In this paper we present a benchmark dataset generated as part of a project for automatic identification of misogyny within online content, which focuses in particular on memes. The benchmark here described is composed of 800 memes collected from the most popular social media platforms, such as Facebook, Twitter, Instagram and Reddit, and consulting websites dedicated to collection and creation of memes. To gather misogynistic memes, specific keywords that refer to misogynistic content have been considered as search criterion, considering different manifestations of hatred against women, such as body shaming, stereotyping, objectification and violence. In parallel, memes with no misogynist content have been manually downloaded from the same web sources. Among all the collected memes, three domain experts have selected a dataset of 800 memes equally balanced between misogynistic and non-misogynistic ones. This dataset has been validated through a crowdsourcing platform, involving 60 subjects for the labelling process, in order to collect three evaluations for each instance. Two further binary labels have been collected from both the experts and the crowdsourcing platform, for memes evaluated as misogynistic, concerning aggressiveness and irony. Finally for each meme, the text has been manually transcribed. The dataset provided is thus composed of the 800 memes, the labels given by the experts and those obtained by the crowdsourcing validation, and the transcribed texts. This data can be used to approach the problem of automatic detection of misogynistic content on the Web relying on both textual and visual cues, facing phenomenons that are growing every day such as cybersexism and technology-facilitated violence.
format	Online Article Text
id	pubmed-9471366
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-94713662022-09-15 Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content Gasparini, Francesca Rizzi, Giulia Saibene, Aurora Fersini, Elisabetta Data Brief Data Article In this paper we present a benchmark dataset generated as part of a project for automatic identification of misogyny within online content, which focuses in particular on memes. The benchmark here described is composed of 800 memes collected from the most popular social media platforms, such as Facebook, Twitter, Instagram and Reddit, and consulting websites dedicated to collection and creation of memes. To gather misogynistic memes, specific keywords that refer to misogynistic content have been considered as search criterion, considering different manifestations of hatred against women, such as body shaming, stereotyping, objectification and violence. In parallel, memes with no misogynist content have been manually downloaded from the same web sources. Among all the collected memes, three domain experts have selected a dataset of 800 memes equally balanced between misogynistic and non-misogynistic ones. This dataset has been validated through a crowdsourcing platform, involving 60 subjects for the labelling process, in order to collect three evaluations for each instance. Two further binary labels have been collected from both the experts and the crowdsourcing platform, for memes evaluated as misogynistic, concerning aggressiveness and irony. Finally for each meme, the text has been manually transcribed. The dataset provided is thus composed of the 800 memes, the labels given by the experts and those obtained by the crowdsourcing validation, and the transcribed texts. This data can be used to approach the problem of automatic detection of misogynistic content on the Web relying on both textual and visual cues, facing phenomenons that are growing every day such as cybersexism and technology-facilitated violence. Elsevier 2022-08-20 /pmc/articles/PMC9471366/ /pubmed/36117643 http://dx.doi.org/10.1016/j.dib.2022.108526 Text en © 2022 Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Gasparini, Francesca Rizzi, Giulia Saibene, Aurora Fersini, Elisabetta Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
title	Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
title_full	Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
title_fullStr	Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
title_full_unstemmed	Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
title_short	Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
title_sort	benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9471366/ https://www.ncbi.nlm.nih.gov/pubmed/36117643 http://dx.doi.org/10.1016/j.dib.2022.108526
work_keys_str_mv	AT gasparinifrancesca benchmarkdatasetofmemeswithtexttranscriptionsforautomaticdetectionofmultimodalmisogynisticcontent AT rizzigiulia benchmarkdatasetofmemeswithtexttranscriptionsforautomaticdetectionofmultimodalmisogynisticcontent AT saibeneaurora benchmarkdatasetofmemeswithtexttranscriptionsforautomaticdetectionofmultimodalmisogynisticcontent AT fersinielisabetta benchmarkdatasetofmemeswithtexttranscriptionsforautomaticdetectionofmultimodalmisogynisticcontent

Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content

Ejemplares similares