Cargando…

Authormagic: Ein Konzept zur Autorenidentifikation in Großen Digitalen Bibliotheken

Author name ambiguities distort the quality of information discovery in digital libraries. These ambiguities also contribute to the inaccurate attribution of authorship to individual researchers. The latter is especially delicate in research evaluation. To solve this issue, many algorithmic bulk dis...

Descripción completa

Detalles Bibliográficos
Autor principal: Weiler, Henning
Lenguaje:eng
Publicado: U. Erlangen-Nuremberg 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1429146
Descripción
Sumario:Author name ambiguities distort the quality of information discovery in digital libraries. These ambiguities also contribute to the inaccurate attribution of authorship to individual researchers. The latter is especially delicate in research evaluation. To solve this issue, many algorithmic bulk disambiguation approaches have been proposed in the literature. However, no algorithmic approach can solve author ambiguities with an accuracy of 100%. Some online projects allow users to manually create publication lists, which are then regarded as profiles of the researchers. The tedious work to manually assemble such publication lists and the unavailability of scientific material in these projects limit the success of these projects. The “Authormagic” concept is developed in this thesis to address the author ambiguity issue with a hybrid approach of combining algorithmic and human intelligence. A customized agglomerative clustering approach first determines publication clusters by comparing available metadata. These clusters ideally represent publication profiles of authors. Users of the digital library can then use an interface to make decisions about the correctness of the algorithmic attributions. Every (operator-approved) decision feeds back into the algorithm to increase the overall matching quality in consecutive runs of the algorithm. The concept also targets the need for sustainable disambiguation solutions that are capable of rapidly updating information in an ever-growing publication landscape. Dedicated online processes incrementally update the cluster information, while an offline process continuously re-clusters information. All processes are constrained by unquestionable and invariable user decisions. The Authormagic concept is shown on the example of INSPIRE, a hand-curated database containing the literature corpus of the entire field of High-Energy Physics (HEP). The metadata in INSPIRE is a great basis for the algorithmic part, while a data-quality-cautious community drives the crowd-sourced intelligence acquisition. The algorithm results are evaluated in comparison to the decisions of users. The evaluation results show that the algorithmic approach is an improvement over non-disambiguated searches. The created author profiles contain more accurate publication and bibliometric statistics than before the disambiguation. Overall can be stated that the concept of combining algorithmic and human intelligence can lead to 100% correct author information, if all researchers participate in the decision-making process. The identified requirements for the Authormagic to be successfully implemented in a digital library are: 1) qualitative and complete metadata and 2) a participating community. The reached data quality in combination with the proposed sustainability strategy makes way for novel author-centric services and meaningful bibliometrics.