Cargando…

Comparison and benchmark of name-to-gender inference services

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with inform...

Descripción completa

Detalles Bibliográficos
Autores principales: Santamaría, Lucía, Mihaljević, Helena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924484/
https://www.ncbi.nlm.nih.gov/pubmed/33816809
http://dx.doi.org/10.7717/peerj-cs.156
_version_ 1783659100040593408
author Santamaría, Lucía
Mihaljević, Helena
author_facet Santamaría, Lucía
Mihaljević, Helena
author_sort Santamaría, Lucía
collection PubMed
description The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.
format Online
Article
Text
id pubmed-7924484
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244842021-04-02 Comparison and benchmark of name-to-gender inference services Santamaría, Lucía Mihaljević, Helena PeerJ Comput Sci Data Mining and Machine Learning The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized. PeerJ Inc. 2018-07-16 /pmc/articles/PMC7924484/ /pubmed/33816809 http://dx.doi.org/10.7717/peerj-cs.156 Text en ©2018 Santamaría and Mihaljević http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Santamaría, Lucía
Mihaljević, Helena
Comparison and benchmark of name-to-gender inference services
title Comparison and benchmark of name-to-gender inference services
title_full Comparison and benchmark of name-to-gender inference services
title_fullStr Comparison and benchmark of name-to-gender inference services
title_full_unstemmed Comparison and benchmark of name-to-gender inference services
title_short Comparison and benchmark of name-to-gender inference services
title_sort comparison and benchmark of name-to-gender inference services
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924484/
https://www.ncbi.nlm.nih.gov/pubmed/33816809
http://dx.doi.org/10.7717/peerj-cs.156
work_keys_str_mv AT santamarialucia comparisonandbenchmarkofnametogenderinferenceservices
AT mihaljevichelena comparisonandbenchmarkofnametogenderinferenceservices