Cargando…

DBD: a transcription factor prediction database

Regulation of gene expression influences almost all biological processes in an organism; sequence-specific DNA-binding transcription factors are critical to this control. For most genomes, the repertoire of transcription factors is only partially known. Hitherto transcription factor identification h...

Descripción completa

Detalles Bibliográficos
Autores principales: Kummerfeld, Sarah K., Teichmann, Sarah A.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347493/
https://www.ncbi.nlm.nih.gov/pubmed/16381970
http://dx.doi.org/10.1093/nar/gkj131
_version_ 1782126632295727104
author Kummerfeld, Sarah K.
Teichmann, Sarah A.
author_facet Kummerfeld, Sarah K.
Teichmann, Sarah A.
author_sort Kummerfeld, Sarah K.
collection PubMed
description Regulation of gene expression influences almost all biological processes in an organism; sequence-specific DNA-binding transcription factors are critical to this control. For most genomes, the repertoire of transcription factors is only partially known. Hitherto transcription factor identification has been largely based on genome annotation pipelines that use pairwise sequence comparisons, which detect only those factors similar to known genes, or on functional classification schemes that amalgamate many types of proteins into the category of ‘transcription factor’. Using a novel transcription factor identification method, the DBD transcription factor database fills this void, providing genome-wide transcription factor predictions for organisms from across the tree of life. The prediction method behind DBD identifies sequence-specific DNA-binding transcription factors through homology using profile hidden Markov models (HMMs) of domains. Thus, it is limited to factors that are homologus to those HMMs. The collection of HMMs is taken from two existing databases (Pfam and SUPERFAMILY), and is limited to models that exclusively detect transcription factors that specifically recognize DNA sequences. It does not include basal transcription factors or chromatin-associated proteins, for instance. Based on comparison with experimentally verified annotation, the prediction procedure is between 95% and 99% accurate. Between one quarter and one-half of our genome-wide predicted transcription factors represent previously uncharacterized proteins. The DBD () consists of predicted transcription factor repertoires for 150 completely sequenced genomes, their domain assignments and the hand curated list of DNA-binding domain HMMs. Users can browse, search or download the predictions by genome, domain family or sequence identifier, view families of transcription factors based on domain architecture and receive predictions for a protein sequence.
format Text
id pubmed-1347493
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-13474932006-01-25 DBD: a transcription factor prediction database Kummerfeld, Sarah K. Teichmann, Sarah A. Nucleic Acids Res Article Regulation of gene expression influences almost all biological processes in an organism; sequence-specific DNA-binding transcription factors are critical to this control. For most genomes, the repertoire of transcription factors is only partially known. Hitherto transcription factor identification has been largely based on genome annotation pipelines that use pairwise sequence comparisons, which detect only those factors similar to known genes, or on functional classification schemes that amalgamate many types of proteins into the category of ‘transcription factor’. Using a novel transcription factor identification method, the DBD transcription factor database fills this void, providing genome-wide transcription factor predictions for organisms from across the tree of life. The prediction method behind DBD identifies sequence-specific DNA-binding transcription factors through homology using profile hidden Markov models (HMMs) of domains. Thus, it is limited to factors that are homologus to those HMMs. The collection of HMMs is taken from two existing databases (Pfam and SUPERFAMILY), and is limited to models that exclusively detect transcription factors that specifically recognize DNA sequences. It does not include basal transcription factors or chromatin-associated proteins, for instance. Based on comparison with experimentally verified annotation, the prediction procedure is between 95% and 99% accurate. Between one quarter and one-half of our genome-wide predicted transcription factors represent previously uncharacterized proteins. The DBD () consists of predicted transcription factor repertoires for 150 completely sequenced genomes, their domain assignments and the hand curated list of DNA-binding domain HMMs. Users can browse, search or download the predictions by genome, domain family or sequence identifier, view families of transcription factors based on domain architecture and receive predictions for a protein sequence. Oxford University Press 2006-01-01 2005-12-28 /pmc/articles/PMC1347493/ /pubmed/16381970 http://dx.doi.org/10.1093/nar/gkj131 Text en © The Author 2006. Published by Oxford University Press. All rights reserved
spellingShingle Article
Kummerfeld, Sarah K.
Teichmann, Sarah A.
DBD: a transcription factor prediction database
title DBD: a transcription factor prediction database
title_full DBD: a transcription factor prediction database
title_fullStr DBD: a transcription factor prediction database
title_full_unstemmed DBD: a transcription factor prediction database
title_short DBD: a transcription factor prediction database
title_sort dbd: a transcription factor prediction database
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347493/
https://www.ncbi.nlm.nih.gov/pubmed/16381970
http://dx.doi.org/10.1093/nar/gkj131
work_keys_str_mv AT kummerfeldsarahk dbdatranscriptionfactorpredictiondatabase
AT teichmannsaraha dbdatranscriptionfactorpredictiondatabase