Cargando…

Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data

BACKGROUND: Non-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene names; however, gene annotation is neither complete, nor fully systematic between organisms, and is also not generally stable...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gilchrist, Michael J, Christensen, Mikkel B, Harland, Richard, Pollet, Nicolas, Smith, James C, Ueno, Naoto, Papalopulu, Nancy
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2587480/ https://www.ncbi.nlm.nih.gov/pubmed/18928517 http://dx.doi.org/10.1186/1471-2105-9-442

_version_	1782160919131848704
author	Gilchrist, Michael J Christensen, Mikkel B Harland, Richard Pollet, Nicolas Smith, James C Ueno, Naoto Papalopulu, Nancy
author_facet	Gilchrist, Michael J Christensen, Mikkel B Harland, Richard Pollet, Nicolas Smith, James C Ueno, Naoto Papalopulu, Nancy
author_sort	Gilchrist, Michael J
collection	PubMed
description	BACKGROUND: Non-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene names; however, gene annotation is neither complete, nor fully systematic between organisms, and is also not generally stable over time. This provides some challenges for text based access, especially for cross-species searches. We propose a method for non-sequence data retrieval based on sequence similarity, which removes dependence on annotation and text searches. This work was motivated by the need to provide better access to large numbers of in situ images, and the observation that such image data were usually associated with a specific gene sequence. Sequence similarity searches are found in existing gene oriented databases, but mostly give indirect access to non-sequence data via navigational links. RESULTS: Three applications were built to explore the proposed method: accessing image data, literature and gene names. Searches are initiated with the sequence of the user's gene of interest, which is searched against a database of sequences associated with the target data. The matching (non-sequence) target data are returned directly to the user's browser, organised by sequence similarity. The method worked well for the intended application in image data management. Comparison with text based searches of the image data set showed the accuracy of the method. Applied to literature searches it facilitated retrieval of mostly high relevance references. Applied to gene name data it provided a useful analysis of name variation of related genes within and between species. CONCLUSION: This method makes a powerful and useful addition to existing methods for searching gene data based on text retrieval or curated gene lists. In particular the method facilitates cross-species comparisons, and enables the handling of novel or otherwise un-annotated genes. Applications using the method are quick and easy to build, and the data require little maintenance. This approach largely circumvents the need for annotation, which can be a major obstacle to the development of genomic scale data resources.
format	Text
id	pubmed-2587480
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25874802008-11-26 Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data Gilchrist, Michael J Christensen, Mikkel B Harland, Richard Pollet, Nicolas Smith, James C Ueno, Naoto Papalopulu, Nancy BMC Bioinformatics Methodology Article BACKGROUND: Non-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene names; however, gene annotation is neither complete, nor fully systematic between organisms, and is also not generally stable over time. This provides some challenges for text based access, especially for cross-species searches. We propose a method for non-sequence data retrieval based on sequence similarity, which removes dependence on annotation and text searches. This work was motivated by the need to provide better access to large numbers of in situ images, and the observation that such image data were usually associated with a specific gene sequence. Sequence similarity searches are found in existing gene oriented databases, but mostly give indirect access to non-sequence data via navigational links. RESULTS: Three applications were built to explore the proposed method: accessing image data, literature and gene names. Searches are initiated with the sequence of the user's gene of interest, which is searched against a database of sequences associated with the target data. The matching (non-sequence) target data are returned directly to the user's browser, organised by sequence similarity. The method worked well for the intended application in image data management. Comparison with text based searches of the image data set showed the accuracy of the method. Applied to literature searches it facilitated retrieval of mostly high relevance references. Applied to gene name data it provided a useful analysis of name variation of related genes within and between species. CONCLUSION: This method makes a powerful and useful addition to existing methods for searching gene data based on text retrieval or curated gene lists. In particular the method facilitates cross-species comparisons, and enables the handling of novel or otherwise un-annotated genes. Applications using the method are quick and easy to build, and the data require little maintenance. This approach largely circumvents the need for annotation, which can be a major obstacle to the development of genomic scale data resources. BioMed Central 2008-10-17 /pmc/articles/PMC2587480/ /pubmed/18928517 http://dx.doi.org/10.1186/1471-2105-9-442 Text en Copyright © 2008 Gilchrist et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Gilchrist, Michael J Christensen, Mikkel B Harland, Richard Pollet, Nicolas Smith, James C Ueno, Naoto Papalopulu, Nancy Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
title	Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
title_full	Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
title_fullStr	Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
title_full_unstemmed	Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
title_short	Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
title_sort	evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2587480/ https://www.ncbi.nlm.nih.gov/pubmed/18928517 http://dx.doi.org/10.1186/1471-2105-9-442
work_keys_str_mv	AT gilchristmichaelj evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata AT christensenmikkelb evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata AT harlandrichard evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata AT polletnicolas evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata AT smithjamesc evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata AT uenonaoto evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata AT papalopulunancy evadingtheannotationbottleneckusingsequencesimilaritytosearchnonsequencegenedata

Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data

Ejemplares similares