Cargando…

Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods

Authors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is chal...

Descripción completa

Detalles Bibliográficos
Autor principal: Klein, Jochen
Lenguaje:eng
Publicado: 2016
Materias:
Acceso en línea:http://cds.cern.ch/record/2203031
_version_ 1780951344149954560
author Klein, Jochen
author_facet Klein, Jochen
author_sort Klein, Jochen
collection CERN
description Authors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is challenging and related to data integration, since this is done in several ways and by different systems, both manually and automatically. Many disambiguation algorithms have been proposed in the literature, where the most solutions are solving the ambiguities by applying machine learning techniques. However, such problems cannot be solved with an accuracy of 100%. Our contributions to the CERN Document Server presented in this work consists of two parts: first, we create and deploy an author knowledge data base (collection) and second, we link authors of bibliographic records back to their authority records. For the latter, we use a library providing machine learning tools for clustering (where we use trained data from INSPIRE---a High-Energy Physics literature database developed at CERN) and construct an algorithm to build the relation, based on authority id and name matching. We could attribute 30% of 9 million authors to almost 9'500 individuals, which is also limited to our current author collection containing more than 41'000 records (and counting), based on people affiliated to the organization.
id cern-2203031
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2016
record_format invenio
spelling cern-22030312019-09-30T06:29:59Zhttp://cds.cern.ch/record/2203031engKlein, JochenEnhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation MethodsComputing and ComputersAuthors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is challenging and related to data integration, since this is done in several ways and by different systems, both manually and automatically. Many disambiguation algorithms have been proposed in the literature, where the most solutions are solving the ambiguities by applying machine learning techniques. However, such problems cannot be solved with an accuracy of 100%. Our contributions to the CERN Document Server presented in this work consists of two parts: first, we create and deploy an author knowledge data base (collection) and second, we link authors of bibliographic records back to their authority records. For the latter, we use a library providing machine learning tools for clustering (where we use trained data from INSPIRE---a High-Energy Physics literature database developed at CERN) and construct an algorithm to build the relation, based on authority id and name matching. We could attribute 30% of 9 million authors to almost 9'500 individuals, which is also limited to our current author collection containing more than 41'000 records (and counting), based on people affiliated to the organization.CERN-THESIS-2016-082oai:cds.cern.ch:22030312016-08-01T11:20:31Z
spellingShingle Computing and Computers
Klein, Jochen
Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
title Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
title_full Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
title_fullStr Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
title_full_unstemmed Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
title_short Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
title_sort enhancing author information for cern document server: creating an author collection and using author disambiguation methods
topic Computing and Computers
url http://cds.cern.ch/record/2203031
work_keys_str_mv AT kleinjochen enhancingauthorinformationforcerndocumentservercreatinganauthorcollectionandusingauthordisambiguationmethods