Cargando…

annot8r: GO, EC and KEGG annotation of EST datasets

BACKGROUND: The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do no...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmid, Ralf, Blaxter, Mark L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2324097/
https://www.ncbi.nlm.nih.gov/pubmed/18400082
http://dx.doi.org/10.1186/1471-2105-9-180
_version_ 1782152717645381632
author Schmid, Ralf
Blaxter, Mark L
author_facet Schmid, Ralf
Blaxter, Mark L
author_sort Schmid, Ralf
collection PubMed
description BACKGROUND: The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. RESULTS: annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. CONCLUSION: annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.
format Text
id pubmed-2324097
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23240972008-04-22 annot8r: GO, EC and KEGG annotation of EST datasets Schmid, Ralf Blaxter, Mark L BMC Bioinformatics Software BACKGROUND: The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. RESULTS: annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. CONCLUSION: annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects. BioMed Central 2008-04-09 /pmc/articles/PMC2324097/ /pubmed/18400082 http://dx.doi.org/10.1186/1471-2105-9-180 Text en Copyright © 2008 Schmid and Blaxter; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Schmid, Ralf
Blaxter, Mark L
annot8r: GO, EC and KEGG annotation of EST datasets
title annot8r: GO, EC and KEGG annotation of EST datasets
title_full annot8r: GO, EC and KEGG annotation of EST datasets
title_fullStr annot8r: GO, EC and KEGG annotation of EST datasets
title_full_unstemmed annot8r: GO, EC and KEGG annotation of EST datasets
title_short annot8r: GO, EC and KEGG annotation of EST datasets
title_sort annot8r: go, ec and kegg annotation of est datasets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2324097/
https://www.ncbi.nlm.nih.gov/pubmed/18400082
http://dx.doi.org/10.1186/1471-2105-9-180
work_keys_str_mv AT schmidralf annot8rgoecandkeggannotationofestdatasets
AT blaxtermarkl annot8rgoecandkeggannotationofestdatasets