Cargando…

ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files

BACKGROUND: Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. RESULTS: A Java program was developed for retrieval of prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Büssow, Konrad, Hoffmann, Steve, Sievert, Volker
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC139979/
https://www.ncbi.nlm.nih.gov/pubmed/12493080
http://dx.doi.org/10.1186/1471-2105-3-40
_version_ 1782120581707071488
author Büssow, Konrad
Hoffmann, Steve
Sievert, Volker
author_facet Büssow, Konrad
Hoffmann, Steve
Sievert, Volker
author_sort Büssow, Konrad
collection PubMed
description BACKGROUND: Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. RESULTS: A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. CONCLUSION: The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.
format Text
id pubmed-139979
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1399792003-01-16 ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files Büssow, Konrad Hoffmann, Steve Sievert, Volker BMC Bioinformatics Methodology article BACKGROUND: Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. RESULTS: A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. CONCLUSION: The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information. BioMed Central 2002-12-19 /pmc/articles/PMC139979/ /pubmed/12493080 http://dx.doi.org/10.1186/1471-2105-3-40 Text en Copyright ©2002 Büssow et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology article
Büssow, Konrad
Hoffmann, Steve
Sievert, Volker
ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
title ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
title_full ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
title_fullStr ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
title_full_unstemmed ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
title_short ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
title_sort orfer – retrieval of protein sequences and open reading frames from genbank and storage into relational databases or text files
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC139979/
https://www.ncbi.nlm.nih.gov/pubmed/12493080
http://dx.doi.org/10.1186/1471-2105-3-40
work_keys_str_mv AT bussowkonrad orferretrievalofproteinsequencesandopenreadingframesfromgenbankandstorageintorelationaldatabasesortextfiles
AT hoffmannsteve orferretrievalofproteinsequencesandopenreadingframesfromgenbankandstorageintorelationaldatabasesortextfiles
AT sievertvolker orferretrievalofproteinsequencesandopenreadingframesfromgenbankandstorageintorelationaldatabasesortextfiles