Cargando…

Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database

BACKGROUND: The NCBI Entrez Gene and PubMed databases contain a wealth of high-quality information about genes for many different organisms. The NCBI Entrez online web-search interface is convenient for simple manual search for a small number of genes but impractical for the kinds of outputs seen in...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Daniel J, Nguyen-Dumont, Tú, Kang, Sori, Verspoor, Karin, Pope, Bernard J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4106183/
http://dx.doi.org/10.1186/1751-0473-9-15
_version_ 1782327488943226880
author Park, Daniel J
Nguyen-Dumont, Tú
Kang, Sori
Verspoor, Karin
Pope, Bernard J
author_facet Park, Daniel J
Nguyen-Dumont, Tú
Kang, Sori
Verspoor, Karin
Pope, Bernard J
author_sort Park, Daniel J
collection PubMed
description BACKGROUND: The NCBI Entrez Gene and PubMed databases contain a wealth of high-quality information about genes for many different organisms. The NCBI Entrez online web-search interface is convenient for simple manual search for a small number of genes but impractical for the kinds of outputs seen in typical genomics projects. RESULTS: We have developed an efficient open source tool implemented in Python called Annokey, which annotates gene lists with the results of a keyword search of the NCBI Entrez Gene database and linked Pubmed article information. The user steers the search by specifying a ranked list of keywords (including multi-word phrases and regular expressions) that are correlated with their topic of interest. Rank information of matched terms allows the user to guide further investigation. We applied Annokey to the entire human Entrez Gene database using the key-term “DNA repair” and assessed its performance in identifying the 176 members of a published “gold standard” list of genes established to be involved in this pathway. For this test case we observed a sensitivity and specificity of 97% and 96%, respectively. CONCLUSIONS: Annokey facilitates the identification of genes related to an area of interest, a task which can be onerous if performed manually on a large number of genes. Annokey provides a way to capitalize on the high quality information provided by the Entrez Gene database allowing both scalability and compatibility with automated analysis pipelines, thus offering the potential to significantly enhance research productivity.
format Online
Article
Text
id pubmed-4106183
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41061832014-07-23 Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database Park, Daniel J Nguyen-Dumont, Tú Kang, Sori Verspoor, Karin Pope, Bernard J Source Code Biol Med Software Review BACKGROUND: The NCBI Entrez Gene and PubMed databases contain a wealth of high-quality information about genes for many different organisms. The NCBI Entrez online web-search interface is convenient for simple manual search for a small number of genes but impractical for the kinds of outputs seen in typical genomics projects. RESULTS: We have developed an efficient open source tool implemented in Python called Annokey, which annotates gene lists with the results of a keyword search of the NCBI Entrez Gene database and linked Pubmed article information. The user steers the search by specifying a ranked list of keywords (including multi-word phrases and regular expressions) that are correlated with their topic of interest. Rank information of matched terms allows the user to guide further investigation. We applied Annokey to the entire human Entrez Gene database using the key-term “DNA repair” and assessed its performance in identifying the 176 members of a published “gold standard” list of genes established to be involved in this pathway. For this test case we observed a sensitivity and specificity of 97% and 96%, respectively. CONCLUSIONS: Annokey facilitates the identification of genes related to an area of interest, a task which can be onerous if performed manually on a large number of genes. Annokey provides a way to capitalize on the high quality information provided by the Entrez Gene database allowing both scalability and compatibility with automated analysis pipelines, thus offering the potential to significantly enhance research productivity. BioMed Central 2014-06-26 /pmc/articles/PMC4106183/ http://dx.doi.org/10.1186/1751-0473-9-15 Text en Copyright © 2014 Park et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software Review
Park, Daniel J
Nguyen-Dumont, Tú
Kang, Sori
Verspoor, Karin
Pope, Bernard J
Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
title Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
title_full Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
title_fullStr Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
title_full_unstemmed Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
title_short Annokey: an annotation tool based on key term search of the NCBI Entrez Gene database
title_sort annokey: an annotation tool based on key term search of the ncbi entrez gene database
topic Software Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4106183/
http://dx.doi.org/10.1186/1751-0473-9-15
work_keys_str_mv AT parkdanielj annokeyanannotationtoolbasedonkeytermsearchofthencbientrezgenedatabase
AT nguyendumonttu annokeyanannotationtoolbasedonkeytermsearchofthencbientrezgenedatabase
AT kangsori annokeyanannotationtoolbasedonkeytermsearchofthencbientrezgenedatabase
AT verspoorkarin annokeyanannotationtoolbasedonkeytermsearchofthencbientrezgenedatabase
AT popebernardj annokeyanannotationtoolbasedonkeytermsearchofthencbientrezgenedatabase