Cargando…

Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity

Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Wei, Marmor, Rebecca, Singh, Siddharth, Wang, Shuang, Demner-Fushman, Dina, Kuo, Tsung-Ting, Hsu, Chun-Nan, Ohno-Machado, Lucila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001748/
https://www.ncbi.nlm.nih.gov/pubmed/27570676
_version_ 1782450475979767808
author Wei, Wei
Marmor, Rebecca
Singh, Siddharth
Wang, Shuang
Demner-Fushman, Dina
Kuo, Tsung-Ting
Hsu, Chun-Nan
Ohno-Machado, Lucila
author_facet Wei, Wei
Marmor, Rebecca
Singh, Siddharth
Wang, Shuang
Demner-Fushman, Dina
Kuo, Tsung-Ting
Hsu, Chun-Nan
Ohno-Machado, Lucila
author_sort Wei, Wei
collection PubMed
description Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations.
format Online
Article
Text
id pubmed-5001748
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-50017482016-08-26 Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity Wei, Wei Marmor, Rebecca Singh, Siddharth Wang, Shuang Demner-Fushman, Dina Kuo, Tsung-Ting Hsu, Chun-Nan Ohno-Machado, Lucila AMIA Jt Summits Transl Sci Proc Articles Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations. American Medical Informatics Association 2016-07-20 /pmc/articles/PMC5001748/ /pubmed/27570676 Text en ©2016 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Wei, Wei
Marmor, Rebecca
Singh, Siddharth
Wang, Shuang
Demner-Fushman, Dina
Kuo, Tsung-Ting
Hsu, Chun-Nan
Ohno-Machado, Lucila
Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity
title Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity
title_full Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity
title_fullStr Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity
title_full_unstemmed Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity
title_short Finding Related Publications: Extending the Set of Terms Used to Assess Article Similarity
title_sort finding related publications: extending the set of terms used to assess article similarity
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001748/
https://www.ncbi.nlm.nih.gov/pubmed/27570676
work_keys_str_mv AT weiwei findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT marmorrebecca findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT singhsiddharth findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT wangshuang findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT demnerfushmandina findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT kuotsungting findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT hsuchunnan findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity
AT ohnomachadolucila findingrelatedpublicationsextendingthesetoftermsusedtoassessarticlesimilarity