Cargando…

Quantifying gender bias towards politicians in cross-lingual language models

Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Stańczak, Karolina, Ray Choudhury, Sagnik, Pimentel, Tiago, Cotterell, Ryan, Augenstein, Isabelle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684026/
https://www.ncbi.nlm.nih.gov/pubmed/38015835
http://dx.doi.org/10.1371/journal.pone.0277640
_version_ 1785151311797288960
author Stańczak, Karolina
Ray Choudhury, Sagnik
Pimentel, Tiago
Cotterell, Ryan
Augenstein, Isabelle
author_facet Stańczak, Karolina
Ray Choudhury, Sagnik
Pimentel, Tiago
Cotterell, Ryan
Augenstein, Isabelle
author_sort Stańczak, Karolina
collection PubMed
description Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.
format Online
Article
Text
id pubmed-10684026
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-106840262023-11-30 Quantifying gender bias towards politicians in cross-lingual language models Stańczak, Karolina Ray Choudhury, Sagnik Pimentel, Tiago Cotterell, Ryan Augenstein, Isabelle PLoS One Research Article Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones. Public Library of Science 2023-11-28 /pmc/articles/PMC10684026/ /pubmed/38015835 http://dx.doi.org/10.1371/journal.pone.0277640 Text en © 2023 Stańczak et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Stańczak, Karolina
Ray Choudhury, Sagnik
Pimentel, Tiago
Cotterell, Ryan
Augenstein, Isabelle
Quantifying gender bias towards politicians in cross-lingual language models
title Quantifying gender bias towards politicians in cross-lingual language models
title_full Quantifying gender bias towards politicians in cross-lingual language models
title_fullStr Quantifying gender bias towards politicians in cross-lingual language models
title_full_unstemmed Quantifying gender bias towards politicians in cross-lingual language models
title_short Quantifying gender bias towards politicians in cross-lingual language models
title_sort quantifying gender bias towards politicians in cross-lingual language models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684026/
https://www.ncbi.nlm.nih.gov/pubmed/38015835
http://dx.doi.org/10.1371/journal.pone.0277640
work_keys_str_mv AT stanczakkarolina quantifyinggenderbiastowardspoliticiansincrosslinguallanguagemodels
AT raychoudhurysagnik quantifyinggenderbiastowardspoliticiansincrosslinguallanguagemodels
AT pimenteltiago quantifyinggenderbiastowardspoliticiansincrosslinguallanguagemodels
AT cotterellryan quantifyinggenderbiastowardspoliticiansincrosslinguallanguagemodels
AT augensteinisabelle quantifyinggenderbiastowardspoliticiansincrosslinguallanguagemodels