Cargando…

Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning

In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the p...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jinseok, Kim, Jenna, Owen‐Smith, Jason
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8359369/
https://www.ncbi.nlm.nih.gov/pubmed/34414251
http://dx.doi.org/10.1002/asi.24459
_version_ 1783737533995155456
author Kim, Jinseok
Kim, Jenna
Owen‐Smith, Jason
author_facet Kim, Jinseok
Kim, Jenna
Owen‐Smith, Jason
author_sort Kim, Jinseok
collection PubMed
description In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity‐based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity‐specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.
format Online
Article
Text
id pubmed-8359369
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-83593692021-08-17 Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning Kim, Jinseok Kim, Jenna Owen‐Smith, Jason J Assoc Inf Sci Technol Research Articles In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity‐based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity‐specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation. John Wiley & Sons, Inc. 2021-02-23 2021-08 /pmc/articles/PMC8359369/ /pubmed/34414251 http://dx.doi.org/10.1002/asi.24459 Text en © 2021 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Kim, Jinseok
Kim, Jenna
Owen‐Smith, Jason
Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
title Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
title_full Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
title_fullStr Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
title_full_unstemmed Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
title_short Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
title_sort ethnicity‐based name partitioning for author name disambiguation using supervised machine learning
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8359369/
https://www.ncbi.nlm.nih.gov/pubmed/34414251
http://dx.doi.org/10.1002/asi.24459
work_keys_str_mv AT kimjinseok ethnicitybasednamepartitioningforauthornamedisambiguationusingsupervisedmachinelearning
AT kimjenna ethnicitybasednamepartitioningforauthornamedisambiguationusingsupervisedmachinelearning
AT owensmithjason ethnicitybasednamepartitioningforauthornamedisambiguationusingsupervisedmachinelearning