Cargando…

Evaluating ranking methods on heterogeneous digital library collections

In the frame of research in particle physics, CERN has been developing its own web-based software /Invenio/ to run the digital library of all the documents related to CERN and fundamental physics. The documents (articles, photos, news, thesis, ...) can be retrieved through a search engine. The resul...

Descripción completa

Detalles Bibliográficos
Autor principal:	Canévet, Olivier
Lenguaje:	eng
Publicado:	2012
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1479357

_version_	1780925636533026816
author	Canévet, Olivier
author_facet	Canévet, Olivier
author_sort	Canévet, Olivier
collection	CERN
description	In the frame of research in particle physics, CERN has been developing its own web-based software /Invenio/ to run the digital library of all the documents related to CERN and fundamental physics. The documents (articles, photos, news, thesis, ...) can be retrieved through a search engine. The results matching the query of the user can be displayed in several ways: sorted by latest first, author, title and also ranked by word similarity. The purpose of this project is to study and implement a new ranking method in Invenio: distributed-ranking (D-Rank). This method aims at aggregating several ranking scores coming from different ranking methods into a new score. In addition to query-related scores such as word similarity, the goal of the work is to take into account non-query-related scores such as citations, journal impact factor and in particular scores related to the document access frequency in the database. The idea is that for two equally query-relevant documents, if one has been more downloaded for instance, it should be displayed in front of the other. The approach that we studied consists in using /logistic regression/ as the aggregation process, which is performed through a weighted sum of the scores to be aggregated. Usually, optimal weights can be computed based on the data. In our case, we used the user feedback: the search activity has been recorded for six months (queries made, displayed, downloaded documents,...) and we divided this data set in two: one to estimate the optimal coefficients and the other to test them. The test consisted in reranking the queries made by the users with the optimal coefficients. Then we compared the results with the initial ranking to see if the documents which were clicked at the time were ranked higher. The optimal coefficients obtained are coherent in the sense that negative attributes for a document got a negative coefficient in the logistic formula. But the order of magnitude between the logistic coefficients were unexpected, as query-relevant score was much lower than the others weights. The re-ranking of the queries showed some improvement for records which had already been downloaded in the database and which were ranked higher.
id	cern-1479357
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2012
record_format	invenio
spelling	cern-14793572019-09-30T06:29:59Zhttp://cds.cern.ch/record/1479357engCanévet, OlivierEvaluating ranking methods on heterogeneous digital library collectionsComputing and ComputersIn the frame of research in particle physics, CERN has been developing its own web-based software /Invenio/ to run the digital library of all the documents related to CERN and fundamental physics. The documents (articles, photos, news, thesis, ...) can be retrieved through a search engine. The results matching the query of the user can be displayed in several ways: sorted by latest first, author, title and also ranked by word similarity. The purpose of this project is to study and implement a new ranking method in Invenio: distributed-ranking (D-Rank). This method aims at aggregating several ranking scores coming from different ranking methods into a new score. In addition to query-related scores such as word similarity, the goal of the work is to take into account non-query-related scores such as citations, journal impact factor and in particular scores related to the document access frequency in the database. The idea is that for two equally query-relevant documents, if one has been more downloaded for instance, it should be displayed in front of the other. The approach that we studied consists in using /logistic regression/ as the aggregation process, which is performed through a weighted sum of the scores to be aggregated. Usually, optimal weights can be computed based on the data. In our case, we used the user feedback: the search activity has been recorded for six months (queries made, displayed, downloaded documents,...) and we divided this data set in two: one to estimate the optimal coefficients and the other to test them. The test consisted in reranking the queries made by the users with the optimal coefficients. Then we compared the results with the initial ranking to see if the documents which were clicked at the time were ranked higher. The optimal coefficients obtained are coherent in the sense that negative attributes for a document got a negative coefficient in the logistic formula. But the order of magnitude between the logistic coefficients were unexpected, as query-relevant score was much lower than the others weights. The re-ranking of the queries showed some improvement for records which had already been downloaded in the database and which were ranked higher.CERN-THESIS-2012-121oai:cds.cern.ch:14793572012-09-21T14:36:02Z
spellingShingle	Computing and Computers Canévet, Olivier Evaluating ranking methods on heterogeneous digital library collections
title	Evaluating ranking methods on heterogeneous digital library collections
title_full	Evaluating ranking methods on heterogeneous digital library collections
title_fullStr	Evaluating ranking methods on heterogeneous digital library collections
title_full_unstemmed	Evaluating ranking methods on heterogeneous digital library collections
title_short	Evaluating ranking methods on heterogeneous digital library collections
title_sort	evaluating ranking methods on heterogeneous digital library collections
topic	Computing and Computers
url	http://cds.cern.ch/record/1479357
work_keys_str_mv	AT canevetolivier evaluatingrankingmethodsonheterogeneousdigitallibrarycollections

Evaluating ranking methods on heterogeneous digital library collections

Ejemplares similares