Cargando…

Identifying author heritage using surname data: An application for Russian surnames

This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary‐based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Usi...

Descripción completa

Detalles Bibliográficos
Autores principales: Karaulova, Maria, Gök, Abdullah, Shapira, Philip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853192/
https://www.ncbi.nlm.nih.gov/pubmed/31763359
http://dx.doi.org/10.1002/asi.24104
_version_ 1783469994584047616
author Karaulova, Maria
Gök, Abdullah
Shapira, Philip
author_facet Karaulova, Maria
Gök, Abdullah
Shapira, Philip
author_sort Karaulova, Maria
collection PubMed
description This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary‐based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname‐based identification method and applied it to infer Russian heritage from suffix‐based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first‐name data to eliminate false‐positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long‐standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.
format Online
Article
Text
id pubmed-6853192
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-68531922019-11-21 Identifying author heritage using surname data: An application for Russian surnames Karaulova, Maria Gök, Abdullah Shapira, Philip J Assoc Inf Sci Technol Research Articles This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary‐based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname‐based identification method and applied it to infer Russian heritage from suffix‐based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first‐name data to eliminate false‐positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long‐standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries. John Wiley & Sons, Inc. 2019-01-25 2019-05 /pmc/articles/PMC6853192/ /pubmed/31763359 http://dx.doi.org/10.1002/asi.24104 Text en © 2019 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals, Inc. on behalf of ASIS&T. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Karaulova, Maria
Gök, Abdullah
Shapira, Philip
Identifying author heritage using surname data: An application for Russian surnames
title Identifying author heritage using surname data: An application for Russian surnames
title_full Identifying author heritage using surname data: An application for Russian surnames
title_fullStr Identifying author heritage using surname data: An application for Russian surnames
title_full_unstemmed Identifying author heritage using surname data: An application for Russian surnames
title_short Identifying author heritage using surname data: An application for Russian surnames
title_sort identifying author heritage using surname data: an application for russian surnames
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853192/
https://www.ncbi.nlm.nih.gov/pubmed/31763359
http://dx.doi.org/10.1002/asi.24104
work_keys_str_mv AT karaulovamaria identifyingauthorheritageusingsurnamedataanapplicationforrussiansurnames
AT gokabdullah identifyingauthorheritageusingsurnamedataanapplicationforrussiansurnames
AT shapiraphilip identifyingauthorheritageusingsurnamedataanapplicationforrussiansurnames