Cargando…
Comparison and benchmark of name-to-gender inference services
The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with inform...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924484/ https://www.ncbi.nlm.nih.gov/pubmed/33816809 http://dx.doi.org/10.7717/peerj-cs.156 |
_version_ | 1783659100040593408 |
---|---|
author | Santamaría, Lucía Mihaljević, Helena |
author_facet | Santamaría, Lucía Mihaljević, Helena |
author_sort | Santamaría, Lucía |
collection | PubMed |
description | The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized. |
format | Online Article Text |
id | pubmed-7924484 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79244842021-04-02 Comparison and benchmark of name-to-gender inference services Santamaría, Lucía Mihaljević, Helena PeerJ Comput Sci Data Mining and Machine Learning The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized. PeerJ Inc. 2018-07-16 /pmc/articles/PMC7924484/ /pubmed/33816809 http://dx.doi.org/10.7717/peerj-cs.156 Text en ©2018 Santamaría and Mihaljević http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Mining and Machine Learning Santamaría, Lucía Mihaljević, Helena Comparison and benchmark of name-to-gender inference services |
title | Comparison and benchmark of name-to-gender inference services |
title_full | Comparison and benchmark of name-to-gender inference services |
title_fullStr | Comparison and benchmark of name-to-gender inference services |
title_full_unstemmed | Comparison and benchmark of name-to-gender inference services |
title_short | Comparison and benchmark of name-to-gender inference services |
title_sort | comparison and benchmark of name-to-gender inference services |
topic | Data Mining and Machine Learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924484/ https://www.ncbi.nlm.nih.gov/pubmed/33816809 http://dx.doi.org/10.7717/peerj-cs.156 |
work_keys_str_mv | AT santamarialucia comparisonandbenchmarkofnametogenderinferenceservices AT mihaljevichelena comparisonandbenchmarkofnametogenderinferenceservices |