Cargando…

How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format

OBJECTIVE: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. METHODS: We constructed...

Descripción completa

Detalles Bibliográficos
Autor principal:	Sebo, Paul
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	University Library System, University of Pittsburgh 2022
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9014919/ https://www.ncbi.nlm.nih.gov/pubmed/35440899 http://dx.doi.org/10.5195/jmla.2022.1289

_version_	1784688279415685120
author	Sebo, Paul
author_facet	Sebo, Paul
author_sort	Sebo, Paul
collection	PubMed
description	OBJECTIVE: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. METHODS: We constructed two datasets for the purpose of the study. File #1 was created by randomly drawing 20,000 names from a gender-labeled database of 52,414 Chinese given names in Pinyin format. File #2, which contained 9,077 names, was created by removing from File #1 all unisex names that we were able to identify (i.e., those that were listed in the database as both male and female names). We recorded for both files the number of correct classifications (correct gender assigned to a name), misclassifications (wrong gender assigned to a name), and nonclassifications (no gender assigned). We then calculated the proportion of misclassifications and nonclassifications (errorCoded). RESULTS: For File #1, errorCoded was 53% for NamSor, 65% for Gender API, and 90% for Wiki-Gendersort. For File #2, errorCoded was 43% for NamSor, 66% for Gender API, and 94% for Wiki-Gendersort. CONCLUSION: We found that all three gender detection tools inaccurately predicted the gender of individuals with Chinese given names in Pinyin format and therefore should not be used in this population.
format	Online Article Text
id	pubmed-9014919
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	University Library System, University of Pittsburgh
record_format	MEDLINE/PubMed
spelling	pubmed-90149192022-04-18 How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format Sebo, Paul J Med Libr Assoc Original Investigation OBJECTIVE: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format. METHODS: We constructed two datasets for the purpose of the study. File #1 was created by randomly drawing 20,000 names from a gender-labeled database of 52,414 Chinese given names in Pinyin format. File #2, which contained 9,077 names, was created by removing from File #1 all unisex names that we were able to identify (i.e., those that were listed in the database as both male and female names). We recorded for both files the number of correct classifications (correct gender assigned to a name), misclassifications (wrong gender assigned to a name), and nonclassifications (no gender assigned). We then calculated the proportion of misclassifications and nonclassifications (errorCoded). RESULTS: For File #1, errorCoded was 53% for NamSor, 65% for Gender API, and 90% for Wiki-Gendersort. For File #2, errorCoded was 43% for NamSor, 66% for Gender API, and 94% for Wiki-Gendersort. CONCLUSION: We found that all three gender detection tools inaccurately predicted the gender of individuals with Chinese given names in Pinyin format and therefore should not be used in this population. University Library System, University of Pittsburgh 2022-04-01 2022-04-01 /pmc/articles/PMC9014919/ /pubmed/35440899 http://dx.doi.org/10.5195/jmla.2022.1289 Text en Copyright © 2022 Paul Sebo https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Original Investigation Sebo, Paul How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
title	How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
title_full	How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
title_fullStr	How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
title_full_unstemmed	How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
title_short	How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format
title_sort	how accurate are gender detection tools in predicting the gender for chinese names? a study with 20,000 given names in pinyin format
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9014919/ https://www.ncbi.nlm.nih.gov/pubmed/35440899 http://dx.doi.org/10.5195/jmla.2022.1289
work_keys_str_mv	AT sebopaul howaccuratearegenderdetectiontoolsinpredictingthegenderforchinesenamesastudywith20000givennamesinpinyinformat

How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format

Ejemplares similares