Cargando…

Ab initio identification of putative human transcription factor binding sites by comparative genomics

BACKGROUND: Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream regi...

Descripción completa

Detalles Bibliográficos
Autores principales: Corà, D, Herrmann, C, Dieterich, C, Di Cunto, F, Provero, P, Caselle, M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1097714/
https://www.ncbi.nlm.nih.gov/pubmed/15865625
http://dx.doi.org/10.1186/1471-2105-6-110
_version_ 1782123908807262208
author Corà, D
Herrmann, C
Dieterich, C
Di Cunto, F
Provero, P
Caselle, M
author_facet Corà, D
Herrmann, C
Dieterich, C
Di Cunto, F
Provero, P
Caselle, M
author_sort Corà, D
collection PubMed
description BACKGROUND: Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome. RESULTS: By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation. In this way we find various known motifs and also some new candidate binding sites. CONCLUSION: We have discussed a new integrated algorithm for the "ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results.
format Text
id pubmed-1097714
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-10977142005-05-12 Ab initio identification of putative human transcription factor binding sites by comparative genomics Corà, D Herrmann, C Dieterich, C Di Cunto, F Provero, P Caselle, M BMC Bioinformatics Research Article BACKGROUND: Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome. RESULTS: By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation. In this way we find various known motifs and also some new candidate binding sites. CONCLUSION: We have discussed a new integrated algorithm for the "ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results. BioMed Central 2005-05-02 /pmc/articles/PMC1097714/ /pubmed/15865625 http://dx.doi.org/10.1186/1471-2105-6-110 Text en Copyright © 2005 Corà et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Corà, D
Herrmann, C
Dieterich, C
Di Cunto, F
Provero, P
Caselle, M
Ab initio identification of putative human transcription factor binding sites by comparative genomics
title Ab initio identification of putative human transcription factor binding sites by comparative genomics
title_full Ab initio identification of putative human transcription factor binding sites by comparative genomics
title_fullStr Ab initio identification of putative human transcription factor binding sites by comparative genomics
title_full_unstemmed Ab initio identification of putative human transcription factor binding sites by comparative genomics
title_short Ab initio identification of putative human transcription factor binding sites by comparative genomics
title_sort ab initio identification of putative human transcription factor binding sites by comparative genomics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1097714/
https://www.ncbi.nlm.nih.gov/pubmed/15865625
http://dx.doi.org/10.1186/1471-2105-6-110
work_keys_str_mv AT corad abinitioidentificationofputativehumantranscriptionfactorbindingsitesbycomparativegenomics
AT herrmannc abinitioidentificationofputativehumantranscriptionfactorbindingsitesbycomparativegenomics
AT dieterichc abinitioidentificationofputativehumantranscriptionfactorbindingsitesbycomparativegenomics
AT dicuntof abinitioidentificationofputativehumantranscriptionfactorbindingsitesbycomparativegenomics
AT proverop abinitioidentificationofputativehumantranscriptionfactorbindingsitesbycomparativegenomics
AT casellem abinitioidentificationofputativehumantranscriptionfactorbindingsitesbycomparativegenomics