Cargando…

GLAD4U: deriving and prioritizing gene lists from PubMed literature

BACKGROUND: Answering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors. RE...

Descripción completa

Detalles Bibliográficos
Autores principales: Jourquin, Jérôme, Duncan, Dexter, Shi, Zhiao, Zhang, Bing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3535723/
https://www.ncbi.nlm.nih.gov/pubmed/23282288
http://dx.doi.org/10.1186/1471-2164-13-S8-S20
_version_ 1782254705460641792
author Jourquin, Jérôme
Duncan, Dexter
Shi, Zhiao
Zhang, Bing
author_facet Jourquin, Jérôme
Duncan, Dexter
Shi, Zhiao
Zhang, Bing
author_sort Jourquin, Jérôme
collection PubMed
description BACKGROUND: Answering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors. RESULTS: We report GLAD4U (Gene List Automatically Derived For You), a new, free web-based gene retrieval and prioritization tool. GLAD4U takes advantage of existing resources of the NCBI to ensure computational efficiency. The quality of gene lists created by GLAD4U for three Gene Ontology (GO) terms and three disease terms was assessed using corresponding "gold standard" lists curated in public databases. For all queries, GLAD4U gene lists showed very high recall but low precision, leading to low F-measure. As a comparison, EBIMed's recall was consistently lower than GLAD4U, but its precision was higher. To present the most relevant genes at the top of a list, we studied two prioritization methods based on publication count and the hypergeometric test, and compared the ranked lists and those generated by EBIMed to the gold standards. Both GLAD4U methods outperformed EBIMed for all queries based on a variety of quality metrics. Moreover, the hypergeometric method allowed for a better performance by thresholding genes with low scores. In addition, manual examination suggests that many false-positives could be explained by the incompleteness of the gold standards. The GLAD4U user interface accepts any valid queries for PubMed, and its output page displays the ranked gene list and information associated with each gene, chronologically-ordered supporting publications, along with a summary of the run and links for file export and functional enrichment and protein interaction network analysis. CONCLUSIONS: GLAD4U has a high overall recall. Although precision is generally low, the prioritization methods successfully rank truly relevant genes at the top of the lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u.
format Online
Article
Text
id pubmed-3535723
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35357232013-01-04 GLAD4U: deriving and prioritizing gene lists from PubMed literature Jourquin, Jérôme Duncan, Dexter Shi, Zhiao Zhang, Bing BMC Genomics Research BACKGROUND: Answering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors. RESULTS: We report GLAD4U (Gene List Automatically Derived For You), a new, free web-based gene retrieval and prioritization tool. GLAD4U takes advantage of existing resources of the NCBI to ensure computational efficiency. The quality of gene lists created by GLAD4U for three Gene Ontology (GO) terms and three disease terms was assessed using corresponding "gold standard" lists curated in public databases. For all queries, GLAD4U gene lists showed very high recall but low precision, leading to low F-measure. As a comparison, EBIMed's recall was consistently lower than GLAD4U, but its precision was higher. To present the most relevant genes at the top of a list, we studied two prioritization methods based on publication count and the hypergeometric test, and compared the ranked lists and those generated by EBIMed to the gold standards. Both GLAD4U methods outperformed EBIMed for all queries based on a variety of quality metrics. Moreover, the hypergeometric method allowed for a better performance by thresholding genes with low scores. In addition, manual examination suggests that many false-positives could be explained by the incompleteness of the gold standards. The GLAD4U user interface accepts any valid queries for PubMed, and its output page displays the ranked gene list and information associated with each gene, chronologically-ordered supporting publications, along with a summary of the run and links for file export and functional enrichment and protein interaction network analysis. CONCLUSIONS: GLAD4U has a high overall recall. Although precision is generally low, the prioritization methods successfully rank truly relevant genes at the top of the lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u. BioMed Central 2012-12-17 /pmc/articles/PMC3535723/ /pubmed/23282288 http://dx.doi.org/10.1186/1471-2164-13-S8-S20 Text en Copyright ©2012 Jourquin et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Jourquin, Jérôme
Duncan, Dexter
Shi, Zhiao
Zhang, Bing
GLAD4U: deriving and prioritizing gene lists from PubMed literature
title GLAD4U: deriving and prioritizing gene lists from PubMed literature
title_full GLAD4U: deriving and prioritizing gene lists from PubMed literature
title_fullStr GLAD4U: deriving and prioritizing gene lists from PubMed literature
title_full_unstemmed GLAD4U: deriving and prioritizing gene lists from PubMed literature
title_short GLAD4U: deriving and prioritizing gene lists from PubMed literature
title_sort glad4u: deriving and prioritizing gene lists from pubmed literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3535723/
https://www.ncbi.nlm.nih.gov/pubmed/23282288
http://dx.doi.org/10.1186/1471-2164-13-S8-S20
work_keys_str_mv AT jourquinjerome glad4uderivingandprioritizinggenelistsfrompubmedliterature
AT duncandexter glad4uderivingandprioritizinggenelistsfrompubmedliterature
AT shizhiao glad4uderivingandprioritizinggenelistsfrompubmedliterature
AT zhangbing glad4uderivingandprioritizinggenelistsfrompubmedliterature