Cargando…
Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods
Authors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is chal...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2203031 |
_version_ | 1780951344149954560 |
---|---|
author | Klein, Jochen |
author_facet | Klein, Jochen |
author_sort | Klein, Jochen |
collection | CERN |
description | Authors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is challenging and related to data integration, since this is done in several ways and by different systems, both manually and automatically. Many disambiguation algorithms have been proposed in the literature, where the most solutions are solving the ambiguities by applying machine learning techniques. However, such problems cannot be solved with an accuracy of 100%. Our contributions to the CERN Document Server presented in this work consists of two parts: first, we create and deploy an author knowledge data base (collection) and second, we link authors of bibliographic records back to their authority records. For the latter, we use a library providing machine learning tools for clustering (where we use trained data from INSPIRE---a High-Energy Physics literature database developed at CERN) and construct an algorithm to build the relation, based on authority id and name matching. We could attribute 30% of 9 million authors to almost 9'500 individuals, which is also limited to our current author collection containing more than 41'000 records (and counting), based on people affiliated to the organization. |
id | cern-2203031 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2016 |
record_format | invenio |
spelling | cern-22030312019-09-30T06:29:59Zhttp://cds.cern.ch/record/2203031engKlein, JochenEnhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation MethodsComputing and ComputersAuthors are a substantial part of queries in digital libraries, where the results are reflecting the service quality and success. Ambiguous author names can confuse users and cause an inaccurate relation between authorships and individual researchers. Providing a set of disambiguated authors is challenging and related to data integration, since this is done in several ways and by different systems, both manually and automatically. Many disambiguation algorithms have been proposed in the literature, where the most solutions are solving the ambiguities by applying machine learning techniques. However, such problems cannot be solved with an accuracy of 100%. Our contributions to the CERN Document Server presented in this work consists of two parts: first, we create and deploy an author knowledge data base (collection) and second, we link authors of bibliographic records back to their authority records. For the latter, we use a library providing machine learning tools for clustering (where we use trained data from INSPIRE---a High-Energy Physics literature database developed at CERN) and construct an algorithm to build the relation, based on authority id and name matching. We could attribute 30% of 9 million authors to almost 9'500 individuals, which is also limited to our current author collection containing more than 41'000 records (and counting), based on people affiliated to the organization.CERN-THESIS-2016-082oai:cds.cern.ch:22030312016-08-01T11:20:31Z |
spellingShingle | Computing and Computers Klein, Jochen Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods |
title | Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods |
title_full | Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods |
title_fullStr | Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods |
title_full_unstemmed | Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods |
title_short | Enhancing Author Information for CERN Document Server: Creating an Author Collection and Using Author Disambiguation Methods |
title_sort | enhancing author information for cern document server: creating an author collection and using author disambiguation methods |
topic | Computing and Computers |
url | http://cds.cern.ch/record/2203031 |
work_keys_str_mv | AT kleinjochen enhancingauthorinformationforcerndocumentservercreatinganauthorcollectionandusingauthordisambiguationmethods |