Cargando…
Ranking Scientific Publications Based on Their Citation Graph
CDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods b...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
EPFL. Lausanne
2009
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1172366 |
_version_ | 1780916160562200576 |
---|---|
author | Marian, L |
author_facet | Marian, L |
author_sort | Marian, L |
collection | CERN |
description | CDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods based on the bibliographic citation graph extracted from the CDS Invenio database. As a first step, we implemented the Citation Count as a baseline ranking method. The major disadvantage of this method is that all citations are treated equally, disregarding their importance and their publication date. To overcome this drawback, we consider two different approaches: a link-based approach which extends the PageRank model to the bibliographic citation graph and a time-dependent approach which takes into account time in the citation counts. In addition, we also combined these two approaches in a hybrid model based on a time-dependent PageRank. In the present document, we describe the conceptual background behind our new ranking methods, detail their implementation and provide a comprehensive analysis of the results obtained with the citation graph extracted from the CDS Invenio database. Our main contributions are: (i) a study of the currently available ranking methods based on a citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods, such as considering all citat ions of equal importance, not taking time into account, or considering the citation graph as complete; (iii) a robust and scalable implementation of the aforementioned ranking methods; (iv) a detailed study of their key parameters. Our study reveals why the dumping factor used by the PageRank algorithm is not suited for ranking bibliographic data and why adding even a week time decay factor still has a strong impact on the final ordering of the documents. |
id | cern-1172366 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2009 |
publisher | EPFL. Lausanne |
record_format | invenio |
spelling | cern-11723662019-09-30T06:29:59Zhttp://cds.cern.ch/record/1172366engMarian, LRanking Scientific Publications Based on Their Citation GraphComputing and ComputersCDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods based on the bibliographic citation graph extracted from the CDS Invenio database. As a first step, we implemented the Citation Count as a baseline ranking method. The major disadvantage of this method is that all citations are treated equally, disregarding their importance and their publication date. To overcome this drawback, we consider two different approaches: a link-based approach which extends the PageRank model to the bibliographic citation graph and a time-dependent approach which takes into account time in the citation counts. In addition, we also combined these two approaches in a hybrid model based on a time-dependent PageRank. In the present document, we describe the conceptual background behind our new ranking methods, detail their implementation and provide a comprehensive analysis of the results obtained with the citation graph extracted from the CDS Invenio database. Our main contributions are: (i) a study of the currently available ranking methods based on a citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods, such as considering all citat ions of equal importance, not taking time into account, or considering the citation graph as complete; (iii) a robust and scalable implementation of the aforementioned ranking methods; (iv) a detailed study of their key parameters. Our study reveals why the dumping factor used by the PageRank algorithm is not suited for ranking bibliographic data and why adding even a week time decay factor still has a strong impact on the final ordering of the documents.EPFL. LausanneCERN-THESIS-2009-029oai:cds.cern.ch:11723662009 |
spellingShingle | Computing and Computers Marian, L Ranking Scientific Publications Based on Their Citation Graph |
title | Ranking Scientific Publications Based on Their Citation Graph |
title_full | Ranking Scientific Publications Based on Their Citation Graph |
title_fullStr | Ranking Scientific Publications Based on Their Citation Graph |
title_full_unstemmed | Ranking Scientific Publications Based on Their Citation Graph |
title_short | Ranking Scientific Publications Based on Their Citation Graph |
title_sort | ranking scientific publications based on their citation graph |
topic | Computing and Computers |
url | http://cds.cern.ch/record/1172366 |
work_keys_str_mv | AT marianl rankingscientificpublicationsbasedontheircitationgraph |