Cargando…

eGIFT: Mining Gene Information from the Literature

BACKGROUND: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientis...

Descripción completa

Detalles Bibliográficos
Autores principales: Tudor, Catalina O, Schmidt, Carl J, Vijay-Shanker, K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929241/
https://www.ncbi.nlm.nih.gov/pubmed/20696046
http://dx.doi.org/10.1186/1471-2105-11-418
_version_ 1782185917745725440
author Tudor, Catalina O
Schmidt, Carl J
Vijay-Shanker, K
author_facet Tudor, Catalina O
Schmidt, Carl J
Vijay-Shanker, K
author_sort Tudor, Catalina O
collection PubMed
description BACKGROUND: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. RESULTS: In this paper, we present eGIFT (http://biotm.cis.udel.edu/eGIFT), a web-based tool that associates informative terms, called iTerms, and sentences containing them, with genes. To associate iTerms with a gene, eGIFT ranks iTerms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. iTerms are grouped into different categories to facilitate a quick inspection. eGIFT also links an iTerm to sentences mentioning the term to allow users to see the relation between the iTerm and the gene. We evaluated the precision and recall of eGIFT's iTerms for 40 genes; between 88% and 94% of the iTerms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as iTerms. CONCLUSIONS: Our evaluations suggest that iTerms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.
format Text
id pubmed-2929241
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29292412010-08-28 eGIFT: Mining Gene Information from the Literature Tudor, Catalina O Schmidt, Carl J Vijay-Shanker, K BMC Bioinformatics Software BACKGROUND: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms. RESULTS: In this paper, we present eGIFT (http://biotm.cis.udel.edu/eGIFT), a web-based tool that associates informative terms, called iTerms, and sentences containing them, with genes. To associate iTerms with a gene, eGIFT ranks iTerms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. iTerms are grouped into different categories to facilitate a quick inspection. eGIFT also links an iTerm to sentences mentioning the term to allow users to see the relation between the iTerm and the gene. We evaluated the precision and recall of eGIFT's iTerms for 40 genes; between 88% and 94% of the iTerms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as iTerms. CONCLUSIONS: Our evaluations suggest that iTerms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions. BioMed Central 2010-08-09 /pmc/articles/PMC2929241/ /pubmed/20696046 http://dx.doi.org/10.1186/1471-2105-11-418 Text en Copyright ©2010 Tudor et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Tudor, Catalina O
Schmidt, Carl J
Vijay-Shanker, K
eGIFT: Mining Gene Information from the Literature
title eGIFT: Mining Gene Information from the Literature
title_full eGIFT: Mining Gene Information from the Literature
title_fullStr eGIFT: Mining Gene Information from the Literature
title_full_unstemmed eGIFT: Mining Gene Information from the Literature
title_short eGIFT: Mining Gene Information from the Literature
title_sort egift: mining gene information from the literature
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929241/
https://www.ncbi.nlm.nih.gov/pubmed/20696046
http://dx.doi.org/10.1186/1471-2105-11-418
work_keys_str_mv AT tudorcatalinao egiftmininggeneinformationfromtheliterature
AT schmidtcarlj egiftmininggeneinformationfromtheliterature
AT vijayshankerk egiftmininggeneinformationfromtheliterature