Cargando…

Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken

Author name ambiguities distort the quality of information discovery in digital libraries. These ambiguities also contribute to the inaccurate attribution of authorship to individual researchers. The latter is especially delicate in research evaluation. To solve this issue, many algorithmic bulk dis...

Descripción completa

Detalles Bibliográficos
Autor principal: Weiler, Henning
Lenguaje:eng
Publicado: U. Erlangen-Nuremberg 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1429146
_version_ 1780924314336362496
author Weiler, Henning
author_facet Weiler, Henning
author_sort Weiler, Henning
collection CERN
description Author name ambiguities distort the quality of information discovery in digital libraries. These ambiguities also contribute to the inaccurate attribution of authorship to individual researchers. The latter is especially delicate in research evaluation. To solve this issue, many algorithmic bulk disambiguation approaches have been proposed in the literature. However, no algorithmic approach can solve author ambiguities with an accuracy of 100%. Some online projects allow users to manually create publication lists, which are then regarded as profiles of the researchers. The tedious work to manually assemble such publication lists and the unavailability of scientific material in these projects limit the success of these projects. The “Authormagic” concept is developed in this thesis to address the author ambiguity issue with a hybrid approach of combining algorithmic and human intelligence. A customized agglomerative clustering approach first determines publication clusters by comparing available metadata. These clusters ideally represent publication profiles of authors. Users of the digital library can then use an interface to make decisions about the correctness of the algorithmic attributions. Every (operator-approved) decision feeds back into the algorithm to increase the overall matching quality in consecutive runs of the algorithm. The concept also targets the need for sustainable disambiguation solutions that are capable of rapidly updating information in an ever-growing publication landscape. Dedicated online processes incrementally update the cluster information, while an offline process continuously re-clusters information. All processes are constrained by unquestionable and invariable user decisions. The Authormagic concept is shown on the example of INSPIRE, a hand-curated database containing the literature corpus of the entire field of High-Energy Physics (HEP). The metadata in INSPIRE is a great basis for the algorithmic part, while a data-quality-cautious community drives the crowd-sourced intelligence acquisition. The algorithm results are evaluated in comparison to the decisions of users. The evaluation results show that the algorithmic approach is an improvement over non-disambiguated searches. The created author profiles contain more accurate publication and bibliometric statistics than before the disambiguation. Overall can be stated that the concept of combining algorithmic and human intelligence can lead to 100% correct author information, if all researchers participate in the decision-making process. The identified requirements for the Authormagic to be successfully implemented in a digital library are: 1) qualitative and complete metadata and 2) a participating community. The reached data quality in combination with the proposed sustainability strategy makes way for novel author-centric services and meaningful bibliometrics.
id cern-1429146
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
publisher U. Erlangen-Nuremberg
record_format invenio
spelling cern-14291462019-09-30T06:29:59Zhttp://cds.cern.ch/record/1429146engWeiler, HenningAuthormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen BibliothekenInformation Transfer and ManagementAuthor name ambiguities distort the quality of information discovery in digital libraries. These ambiguities also contribute to the inaccurate attribution of authorship to individual researchers. The latter is especially delicate in research evaluation. To solve this issue, many algorithmic bulk disambiguation approaches have been proposed in the literature. However, no algorithmic approach can solve author ambiguities with an accuracy of 100%. Some online projects allow users to manually create publication lists, which are then regarded as profiles of the researchers. The tedious work to manually assemble such publication lists and the unavailability of scientific material in these projects limit the success of these projects. The “Authormagic” concept is developed in this thesis to address the author ambiguity issue with a hybrid approach of combining algorithmic and human intelligence. A customized agglomerative clustering approach first determines publication clusters by comparing available metadata. These clusters ideally represent publication profiles of authors. Users of the digital library can then use an interface to make decisions about the correctness of the algorithmic attributions. Every (operator-approved) decision feeds back into the algorithm to increase the overall matching quality in consecutive runs of the algorithm. The concept also targets the need for sustainable disambiguation solutions that are capable of rapidly updating information in an ever-growing publication landscape. Dedicated online processes incrementally update the cluster information, while an offline process continuously re-clusters information. All processes are constrained by unquestionable and invariable user decisions. The Authormagic concept is shown on the example of INSPIRE, a hand-curated database containing the literature corpus of the entire field of High-Energy Physics (HEP). The metadata in INSPIRE is a great basis for the algorithmic part, while a data-quality-cautious community drives the crowd-sourced intelligence acquisition. The algorithm results are evaluated in comparison to the decisions of users. The evaluation results show that the algorithmic approach is an improvement over non-disambiguated searches. The created author profiles contain more accurate publication and bibliometric statistics than before the disambiguation. Overall can be stated that the concept of combining algorithmic and human intelligence can lead to 100% correct author information, if all researchers participate in the decision-making process. The identified requirements for the Authormagic to be successfully implemented in a digital library are: 1) qualitative and complete metadata and 2) a participating community. The reached data quality in combination with the proposed sustainability strategy makes way for novel author-centric services and meaningful bibliometrics.U. Erlangen-NurembergCERN-THESIS-2012-013oai:cds.cern.ch:14291462012
spellingShingle Information Transfer and Management
Weiler, Henning
Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken
title Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken
title_full Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken
title_fullStr Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken
title_full_unstemmed Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken
title_short Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken
title_sort authormagic: ein konzept zur autorenidentifikation in großen digitalen bibliotheken
topic Information Transfer and Management
url http://cds.cern.ch/record/1429146
work_keys_str_mv AT weilerhenning authormagiceinkonzeptzurautorenidentifikationingroßendigitalenbibliotheken