Cargando…

PRESTA: associating promoter sequences with information on gene expression

BACKGROUND: Large sets of well-characterized promoter sequences are required to facilitate the understanding of promoter architecture. The major sequence databases are a prospective source of upstream regulatory regions, but suffer from inaccurate annotation. The software tool PRESTA (PRomoter EST A...

Descripción completa

Detalles Bibliográficos
Autor principal: Mach, Václav
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126875/
https://www.ncbi.nlm.nih.gov/pubmed/12225589
_version_ 1782120335328411648
author Mach, Václav
author_facet Mach, Václav
author_sort Mach, Václav
collection PubMed
description BACKGROUND: Large sets of well-characterized promoter sequences are required to facilitate the understanding of promoter architecture. The major sequence databases are a prospective source of upstream regulatory regions, but suffer from inaccurate annotation. The software tool PRESTA (PRomoter EST Association) presented in this study is designed for efficient recovery of characterized and partially verified promoters from GenBank and EMBL libraries. RESULTS: The PRESTA algorithm examines the putative GenBank/EMBL promoters and automatically removes most of the poorly annotated entries. The remaining records are connected to expressed sequence tags (ESTs) through a high-stringency BLAST search. The frequency and source of recovered ESTs provide an estimate of the activity and expression pattern of the promoter, and the ESTs' 5' ends assist in transcription start-site verification. The PRESTA database provides easy access to non-redundant upstream regulatory regions recently extracted by the PRESTA algorithm. The current size of this resource is 552 human and 241 mouse promoters. Surprisingly, no overlap between the PRESTA database and the Eukaryotic Promoter Database (EPD) was detected by sequence comparison. CONCLUSIONS: The PRESTA algorithm demonstrates the principle of promoter verification by mapping EST 5' ends. The publicly available PRESTA database collects hundreds of characterized and partially verified promoter sequences and is complementary to other promoter databases.
format Text
id pubmed-126875
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1268752002-10-09 PRESTA: associating promoter sequences with information on gene expression Mach, Václav Genome Biol Research BACKGROUND: Large sets of well-characterized promoter sequences are required to facilitate the understanding of promoter architecture. The major sequence databases are a prospective source of upstream regulatory regions, but suffer from inaccurate annotation. The software tool PRESTA (PRomoter EST Association) presented in this study is designed for efficient recovery of characterized and partially verified promoters from GenBank and EMBL libraries. RESULTS: The PRESTA algorithm examines the putative GenBank/EMBL promoters and automatically removes most of the poorly annotated entries. The remaining records are connected to expressed sequence tags (ESTs) through a high-stringency BLAST search. The frequency and source of recovered ESTs provide an estimate of the activity and expression pattern of the promoter, and the ESTs' 5' ends assist in transcription start-site verification. The PRESTA database provides easy access to non-redundant upstream regulatory regions recently extracted by the PRESTA algorithm. The current size of this resource is 552 human and 241 mouse promoters. Surprisingly, no overlap between the PRESTA database and the Eukaryotic Promoter Database (EPD) was detected by sequence comparison. CONCLUSIONS: The PRESTA algorithm demonstrates the principle of promoter verification by mapping EST 5' ends. The publicly available PRESTA database collects hundreds of characterized and partially verified promoter sequences and is complementary to other promoter databases. BioMed Central 2002 2002-08-21 /pmc/articles/PMC126875/ /pubmed/12225589 Text en Copyright © 2002 Mach, licensee BioMed Central Ltd
spellingShingle Research
Mach, Václav
PRESTA: associating promoter sequences with information on gene expression
title PRESTA: associating promoter sequences with information on gene expression
title_full PRESTA: associating promoter sequences with information on gene expression
title_fullStr PRESTA: associating promoter sequences with information on gene expression
title_full_unstemmed PRESTA: associating promoter sequences with information on gene expression
title_short PRESTA: associating promoter sequences with information on gene expression
title_sort presta: associating promoter sequences with information on gene expression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126875/
https://www.ncbi.nlm.nih.gov/pubmed/12225589
work_keys_str_mv AT machvaclav prestaassociatingpromotersequenceswithinformationongeneexpression