Cargando…

A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings

Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in different forms, depending on the language and the cultural context. In this work, we ext...

Descripción completa

Detalles Bibliográficos
Autores principales: Kurpicz-Briki, Mascha, Leoni, Tomaso
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8209512/
https://www.ncbi.nlm.nih.gov/pubmed/34151257
http://dx.doi.org/10.3389/fdata.2021.625290
_version_ 1783709145056149504
author Kurpicz-Briki, Mascha
Leoni, Tomaso
author_facet Kurpicz-Briki, Mascha
Leoni, Tomaso
author_sort Kurpicz-Briki, Mascha
collection PubMed
description Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in different forms, depending on the language and the cultural context. In this work, we extend our previous work and further investigate how bias varies in different languages. We examine Italian and Swedish word embeddings for gender and origin bias, and demonstrate how an origin bias concerning local migration groups in Switzerland is included in German word embeddings. We propose BiasWords, a method to automatically detect new forms of bias. Finally, we discuss how cultural and language aspects are relevant to the impact of bias on the application and to potential mitigation measures.
format Online
Article
Text
id pubmed-8209512
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82095122021-06-18 A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings Kurpicz-Briki, Mascha Leoni, Tomaso Front Big Data Big Data Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in different forms, depending on the language and the cultural context. In this work, we extend our previous work and further investigate how bias varies in different languages. We examine Italian and Swedish word embeddings for gender and origin bias, and demonstrate how an origin bias concerning local migration groups in Switzerland is included in German word embeddings. We propose BiasWords, a method to automatically detect new forms of bias. Finally, we discuss how cultural and language aspects are relevant to the impact of bias on the application and to potential mitigation measures. Frontiers Media S.A. 2021-06-03 /pmc/articles/PMC8209512/ /pubmed/34151257 http://dx.doi.org/10.3389/fdata.2021.625290 Text en Copyright © 2021 Kurpicz-Briki and Leoni. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Kurpicz-Briki, Mascha
Leoni, Tomaso
A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
title A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
title_full A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
title_fullStr A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
title_full_unstemmed A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
title_short A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings
title_sort world full of stereotypes? further investigation on origin and gender bias in multi-lingual word embeddings
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8209512/
https://www.ncbi.nlm.nih.gov/pubmed/34151257
http://dx.doi.org/10.3389/fdata.2021.625290
work_keys_str_mv AT kurpiczbrikimascha aworldfullofstereotypesfurtherinvestigationonoriginandgenderbiasinmultilingualwordembeddings
AT leonitomaso aworldfullofstereotypesfurtherinvestigationonoriginandgenderbiasinmultilingualwordembeddings
AT kurpiczbrikimascha worldfullofstereotypesfurtherinvestigationonoriginandgenderbiasinmultilingualwordembeddings
AT leonitomaso worldfullofstereotypesfurtherinvestigationonoriginandgenderbiasinmultilingualwordembeddings