Cargando…

Ranking Scientific Publications Based on Their Citation Graph

CDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods b...

Descripción completa

Detalles Bibliográficos
Autor principal: Marian, L
Lenguaje:eng
Publicado: EPFL. Lausanne 2009
Materias:
Acceso en línea:http://cds.cern.ch/record/1172366
_version_ 1780916160562200576
author Marian, L
author_facet Marian, L
author_sort Marian, L
collection CERN
description CDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods based on the bibliographic citation graph extracted from the CDS Invenio database. As a first step, we implemented the Citation Count as a baseline ranking method. The major disadvantage of this method is that all citations are treated equally, disregarding their importance and their publication date. To overcome this drawback, we consider two different approaches: a link-based approach which extends the PageRank model to the bibliographic citation graph and a time-dependent approach which takes into account time in the citation counts. In addition, we also combined these two approaches in a hybrid model based on a time-dependent PageRank. In the present document, we describe the conceptual background behind our new ranking methods, detail their implementation and provide a comprehensive analysis of the results obtained with the citation graph extracted from the CDS Invenio database. Our main contributions are: (i) a study of the currently available ranking methods based on a citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods, such as considering all citat ions of equal importance, not taking time into account, or considering the citation graph as complete; (iii) a robust and scalable implementation of the aforementioned ranking methods; (iv) a detailed study of their key parameters. Our study reveals why the dumping factor used by the PageRank algorithm is not suited for ranking bibliographic data and why adding even a week time decay factor still has a strong impact on the final ordering of the documents.
id cern-1172366
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2009
publisher EPFL. Lausanne
record_format invenio
spelling cern-11723662019-09-30T06:29:59Zhttp://cds.cern.ch/record/1172366engMarian, LRanking Scientific Publications Based on Their Citation GraphComputing and ComputersCDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods based on the bibliographic citation graph extracted from the CDS Invenio database. As a first step, we implemented the Citation Count as a baseline ranking method. The major disadvantage of this method is that all citations are treated equally, disregarding their importance and their publication date. To overcome this drawback, we consider two different approaches: a link-based approach which extends the PageRank model to the bibliographic citation graph and a time-dependent approach which takes into account time in the citation counts. In addition, we also combined these two approaches in a hybrid model based on a time-dependent PageRank. In the present document, we describe the conceptual background behind our new ranking methods, detail their implementation and provide a comprehensive analysis of the results obtained with the citation graph extracted from the CDS Invenio database. Our main contributions are: (i) a study of the currently available ranking methods based on a citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods, such as considering all citat ions of equal importance, not taking time into account, or considering the citation graph as complete; (iii) a robust and scalable implementation of the aforementioned ranking methods; (iv) a detailed study of their key parameters. Our study reveals why the dumping factor used by the PageRank algorithm is not suited for ranking bibliographic data and why adding even a week time decay factor still has a strong impact on the final ordering of the documents.EPFL. LausanneCERN-THESIS-2009-029oai:cds.cern.ch:11723662009
spellingShingle Computing and Computers
Marian, L
Ranking Scientific Publications Based on Their Citation Graph
title Ranking Scientific Publications Based on Their Citation Graph
title_full Ranking Scientific Publications Based on Their Citation Graph
title_fullStr Ranking Scientific Publications Based on Their Citation Graph
title_full_unstemmed Ranking Scientific Publications Based on Their Citation Graph
title_short Ranking Scientific Publications Based on Their Citation Graph
title_sort ranking scientific publications based on their citation graph
topic Computing and Computers
url http://cds.cern.ch/record/1172366
work_keys_str_mv AT marianl rankingscientificpublicationsbasedontheircitationgraph