Cargando…
G-Bean: an ontology-graph based web tool for biomedical literature retrieval
BACKGROUND: Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' ind...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243180/ https://www.ncbi.nlm.nih.gov/pubmed/25474588 http://dx.doi.org/10.1186/1471-2105-15-S12-S1 |
_version_ | 1782346070318120960 |
---|---|
author | Wang, James Z Zhang, Yuanyuan Dong, Liang Li, Lin Srimani, Pradip K Yu, Philip S |
author_facet | Wang, James Z Zhang, Yuanyuan Dong, Liang Li, Lin Srimani, Pradip K Yu, Philip S |
author_sort | Wang, James Z |
collection | PubMed |
description | BACKGROUND: Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. METHODS: G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. RESULTS: Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. CONCLUSIONS: G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. |
format | Online Article Text |
id | pubmed-4243180 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42431802014-11-26 G-Bean: an ontology-graph based web tool for biomedical literature retrieval Wang, James Z Zhang, Yuanyuan Dong, Liang Li, Lin Srimani, Pradip K Yu, Philip S BMC Bioinformatics Research BACKGROUND: Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. METHODS: G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. RESULTS: Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. CONCLUSIONS: G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. BioMed Central 2014-11-06 /pmc/articles/PMC4243180/ /pubmed/25474588 http://dx.doi.org/10.1186/1471-2105-15-S12-S1 Text en Copyright © 2014 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, James Z Zhang, Yuanyuan Dong, Liang Li, Lin Srimani, Pradip K Yu, Philip S G-Bean: an ontology-graph based web tool for biomedical literature retrieval |
title | G-Bean: an ontology-graph based web tool for biomedical literature retrieval |
title_full | G-Bean: an ontology-graph based web tool for biomedical literature retrieval |
title_fullStr | G-Bean: an ontology-graph based web tool for biomedical literature retrieval |
title_full_unstemmed | G-Bean: an ontology-graph based web tool for biomedical literature retrieval |
title_short | G-Bean: an ontology-graph based web tool for biomedical literature retrieval |
title_sort | g-bean: an ontology-graph based web tool for biomedical literature retrieval |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243180/ https://www.ncbi.nlm.nih.gov/pubmed/25474588 http://dx.doi.org/10.1186/1471-2105-15-S12-S1 |
work_keys_str_mv | AT wangjamesz gbeananontologygraphbasedwebtoolforbiomedicalliteratureretrieval AT zhangyuanyuan gbeananontologygraphbasedwebtoolforbiomedicalliteratureretrieval AT dongliang gbeananontologygraphbasedwebtoolforbiomedicalliteratureretrieval AT lilin gbeananontologygraphbasedwebtoolforbiomedicalliteratureretrieval AT srimanipradipk gbeananontologygraphbasedwebtoolforbiomedicalliteratureretrieval AT yuphilips gbeananontologygraphbasedwebtoolforbiomedicalliteratureretrieval |