Cargando…

An analysis of unconscious gender bias in academic texts by means of a decision algorithm

Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a...

Descripción completa

Detalles Bibliográficos
Autores principales: Orgeira-Crespo, Pedro, Míguez-Álvarez, Carla, Cuevas-Alonso, Miguel, Rivo-López, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8483299/
https://www.ncbi.nlm.nih.gov/pubmed/34591923
http://dx.doi.org/10.1371/journal.pone.0257903
_version_ 1784577090768601088
author Orgeira-Crespo, Pedro
Míguez-Álvarez, Carla
Cuevas-Alonso, Miguel
Rivo-López, Elena
author_facet Orgeira-Crespo, Pedro
Míguez-Álvarez, Carla
Cuevas-Alonso, Miguel
Rivo-López, Elena
author_sort Orgeira-Crespo, Pedro
collection PubMed
description Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms.
format Online
Article
Text
id pubmed-8483299
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-84832992021-10-01 An analysis of unconscious gender bias in academic texts by means of a decision algorithm Orgeira-Crespo, Pedro Míguez-Álvarez, Carla Cuevas-Alonso, Miguel Rivo-López, Elena PLoS One Research Article Inclusive language focuses on using the vocabulary to avoid exclusion or discrimination, specially referred to gender. The task of finding gender bias in written documents must be performed manually, and it is a time-consuming process. Consequently, studying the usage of non-inclusive language on a document, and the impact of different document properties (such as author gender, date of presentation, etc.) on how many non-inclusive instances are found, is quite difficult or even impossible for big datasets. This research analyzes the gender bias in academic texts by analyzing a study corpus of more than 12,000 million words obtained from more than one hundred thousand doctoral theses from Spanish universities. For this purpose, an automated algorithm was developed to evaluate the different characteristics of the document and look for interactions between age, year of publication, gender or the field of knowledge in which the doctoral thesis is framed. The algorithm identified information patterns using a CNN (convolutional neural network) by the creation of a vector representation of the sentences. The results showed evidence that there was a greater bias as the age of the authors increased, who were more likely to use non-inclusive terms; it was concluded that there is a greater awareness of inclusiveness in women than in men, and also that this awareness grows as the candidate is younger. The results showed evidence that the age of the authors increased discrimination, with men being more likely to use non-inclusive terms (up to an index of 23.12), showing that there is a greater awareness of inclusiveness in women than in men in all age ranges (with an average of 14.99), and also that this awareness grows as the candidate is younger (falling down to 13.07). In terms of field of knowledge, the humanities are the most biased (20.97), discarding the subgroup of Linguistics, which has the least bias at all levels (9.90), and the field of science and engineering, which also have the least influence (13.46). Those results support the assumption that the bias in academic texts (doctoral theses) is due to unconscious issues: otherwise, it would not depend on the field, age, gender, and would occur in any field in the same proportion. The innovation provided by this research lies mainly in the ability to detect, within a textual document in Spanish, whether the use of language can be considered non-inclusive, based on a CNN that has been trained in the context of the doctoral thesis. A significant number of documents have been used, using all accessible doctoral theses from Spanish universities of the last 40 years; this dataset is only manageable by data mining systems, so that the training allows identifying the terms within the context effectively and compiling them in a novel dictionary of non-inclusive terms. Public Library of Science 2021-09-30 /pmc/articles/PMC8483299/ /pubmed/34591923 http://dx.doi.org/10.1371/journal.pone.0257903 Text en © 2021 Orgeira-Crespo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Orgeira-Crespo, Pedro
Míguez-Álvarez, Carla
Cuevas-Alonso, Miguel
Rivo-López, Elena
An analysis of unconscious gender bias in academic texts by means of a decision algorithm
title An analysis of unconscious gender bias in academic texts by means of a decision algorithm
title_full An analysis of unconscious gender bias in academic texts by means of a decision algorithm
title_fullStr An analysis of unconscious gender bias in academic texts by means of a decision algorithm
title_full_unstemmed An analysis of unconscious gender bias in academic texts by means of a decision algorithm
title_short An analysis of unconscious gender bias in academic texts by means of a decision algorithm
title_sort analysis of unconscious gender bias in academic texts by means of a decision algorithm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8483299/
https://www.ncbi.nlm.nih.gov/pubmed/34591923
http://dx.doi.org/10.1371/journal.pone.0257903
work_keys_str_mv AT orgeiracrespopedro ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT miguezalvarezcarla ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT cuevasalonsomiguel ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT rivolopezelena ananalysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT orgeiracrespopedro analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT miguezalvarezcarla analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT cuevasalonsomiguel analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm
AT rivolopezelena analysisofunconsciousgenderbiasinacademictextsbymeansofadecisionalgorithm