Cargando…
PubMed related articles: a probabilistic topic-based model for content similarity
BACKGROUND: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabil...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2212667/ https://www.ncbi.nlm.nih.gov/pubmed/17971238 http://dx.doi.org/10.1186/1471-2105-8-423 |
_version_ | 1782148741007933440 |
---|---|
author | Lin, Jimmy Wilbur, W John |
author_facet | Lin, Jimmy Wilbur, W John |
author_sort | Lin, Jimmy |
collection | PubMed |
description | BACKGROUND: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH (® )in MEDLINE (®). RESULTS: The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. CONCLUSION: Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search. |
format | Text |
id | pubmed-2212667 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-22126672008-01-24 PubMed related articles: a probabilistic topic-based model for content similarity Lin, Jimmy Wilbur, W John BMC Bioinformatics Research Article BACKGROUND: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH (® )in MEDLINE (®). RESULTS: The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. CONCLUSION: Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search. BioMed Central 2007-10-30 /pmc/articles/PMC2212667/ /pubmed/17971238 http://dx.doi.org/10.1186/1471-2105-8-423 Text en Copyright © 2007 Lin and Wilbur; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Lin, Jimmy Wilbur, W John PubMed related articles: a probabilistic topic-based model for content similarity |
title | PubMed related articles: a probabilistic topic-based model for content similarity |
title_full | PubMed related articles: a probabilistic topic-based model for content similarity |
title_fullStr | PubMed related articles: a probabilistic topic-based model for content similarity |
title_full_unstemmed | PubMed related articles: a probabilistic topic-based model for content similarity |
title_short | PubMed related articles: a probabilistic topic-based model for content similarity |
title_sort | pubmed related articles: a probabilistic topic-based model for content similarity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2212667/ https://www.ncbi.nlm.nih.gov/pubmed/17971238 http://dx.doi.org/10.1186/1471-2105-8-423 |
work_keys_str_mv | AT linjimmy pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity AT wilburwjohn pubmedrelatedarticlesaprobabilistictopicbasedmodelforcontentsimilarity |