Cargando…

Semantically linking and browsing PubMed abstracts with gene ontology

BACKGROUND: The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 millio...

Descripción completa

Detalles Bibliográficos
Autores principales: Vanteru, Bhanu C, Shaik, Jahangheer S, Yeasin, Mohammed
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386052/
https://www.ncbi.nlm.nih.gov/pubmed/18366599
http://dx.doi.org/10.1186/1471-2164-9-S1-S10
_version_ 1782155201066565632
author Vanteru, Bhanu C
Shaik, Jahangheer S
Yeasin, Mohammed
author_facet Vanteru, Bhanu C
Shaik, Jahangheer S
Yeasin, Mohammed
author_sort Vanteru, Bhanu C
collection PubMed
description BACKGROUND: The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology. RESULTS: The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics. CONCLUSIONS: The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.
format Text
id pubmed-2386052
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23860522008-06-04 Semantically linking and browsing PubMed abstracts with gene ontology Vanteru, Bhanu C Shaik, Jahangheer S Yeasin, Mohammed BMC Genomics Research BACKGROUND: The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology. RESULTS: The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics. CONCLUSIONS: The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques. BioMed Central 2008-03-20 /pmc/articles/PMC2386052/ /pubmed/18366599 http://dx.doi.org/10.1186/1471-2164-9-S1-S10 Text en Copyright © 2008 Vanteru et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Vanteru, Bhanu C
Shaik, Jahangheer S
Yeasin, Mohammed
Semantically linking and browsing PubMed abstracts with gene ontology
title Semantically linking and browsing PubMed abstracts with gene ontology
title_full Semantically linking and browsing PubMed abstracts with gene ontology
title_fullStr Semantically linking and browsing PubMed abstracts with gene ontology
title_full_unstemmed Semantically linking and browsing PubMed abstracts with gene ontology
title_short Semantically linking and browsing PubMed abstracts with gene ontology
title_sort semantically linking and browsing pubmed abstracts with gene ontology
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386052/
https://www.ncbi.nlm.nih.gov/pubmed/18366599
http://dx.doi.org/10.1186/1471-2164-9-S1-S10
work_keys_str_mv AT vanterubhanuc semanticallylinkingandbrowsingpubmedabstractswithgeneontology
AT shaikjahangheers semanticallylinkingandbrowsingpubmedabstractswithgeneontology
AT yeasinmohammed semanticallylinkingandbrowsingpubmedabstractswithgeneontology