Cargando…

PubMed related articles: a probabilistic topic-based model for content similarity

BACKGROUND: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabil...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Jimmy, Wilbur, W John
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2212667/
https://www.ncbi.nlm.nih.gov/pubmed/17971238
http://dx.doi.org/10.1186/1471-2105-8-423
_version_ 1782148741007933440
author Lin, Jimmy
Wilbur, W John
author_facet Lin, Jimmy
Wilbur, W John
author_sort Lin, Jimmy
collection PubMed
description BACKGROUND: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH (® )in MEDLINE (®). RESULTS: The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. CONCLUSION: Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search.
format Text
id pubmed-2212667
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22126672008-01-24 PubMed related articles: a probabilistic topic-based model for content similarity Lin, Jimmy Wilbur, W John BMC Bioinformatics Research Article BACKGROUND: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH (® )in MEDLINE (®). RESULTS: The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. CONCLUSION: Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search. BioMed Central 2007-10-30 /pmc/articles/PMC2212667/ /pubmed/17971238 http://dx.doi.org/10.1186/1471-2105-8-423 Text en Copyright © 2007 Lin and Wilbur; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lin, Jimmy
Wilbur, W John
PubMed related articles: a probabilistic topic-based model for content similarity
title PubMed related articles: a probabilistic topic-based model for content similarity
title_full PubMed related articles: a probabilistic topic-based model for content similarity
title_fullStr PubMed related articles: a probabilistic topic-based model for content similarity
title_full_unstemmed PubMed related articles: a probabilistic topic-based model for content similarity
title_short PubMed related articles: a probabilistic topic-based model for content similarity
title_sort pubmed related articles: a probabilistic topic-based model for content similarity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2212667/
https://www.ncbi.nlm.nih.gov/pubmed/17971238
http://dx.doi.org/10.1186/1471-2105-8-423
work_keys_str_mv AT linjimmy pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity
AT wilburwjohn pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity