Cargando…

PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval

BACKGROUND: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the curr...

Descripción completa

Detalles Bibliográficos
Autor principal: Lin, Jimmy
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2442104/
https://www.ncbi.nlm.nih.gov/pubmed/18538027
http://dx.doi.org/10.1186/1471-2105-9-270
_version_ 1782156674386100224
author Lin, Jimmy
author_facet Lin, Jimmy
author_sort Lin, Jimmy
collection PubMed
description BACKGROUND: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed(® )search interface, a MEDLINE(® )citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web. RESULTS: We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics. CONCLUSION: The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.
format Text
id pubmed-2442104
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24421042008-07-01 PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval Lin, Jimmy BMC Bioinformatics Research Article BACKGROUND: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed(® )search interface, a MEDLINE(® )citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web. RESULTS: We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics. CONCLUSION: The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain. BioMed Central 2008-06-06 /pmc/articles/PMC2442104/ /pubmed/18538027 http://dx.doi.org/10.1186/1471-2105-9-270 Text en Copyright © 2008 Lin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lin, Jimmy
PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
title PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
title_full PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
title_fullStr PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
title_full_unstemmed PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
title_short PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
title_sort pagerank without hyperlinks: reranking with pubmed related article networks for biomedical text retrieval
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2442104/
https://www.ncbi.nlm.nih.gov/pubmed/18538027
http://dx.doi.org/10.1186/1471-2105-9-270
work_keys_str_mv AT linjimmy pagerankwithouthyperlinksrerankingwithpubmedrelatedarticlenetworksforbiomedicaltextretrieval