Cargando…

Genome wide identification of regulatory motifs in Bacillus subtilis

BACKGROUND: To explain the vastly different phenotypes exhibited by the same organism under different conditions, it is essential that we understand how the organism's genes are coordinately regulated. While there are many excellent tools for predicting sequences encoding proteins or RNA genes,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mwangi, Michael M, Siggia, Eric D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC165661/
https://www.ncbi.nlm.nih.gov/pubmed/12749771
http://dx.doi.org/10.1186/1471-2105-4-18
_version_ 1782120845433372672
author Mwangi, Michael M
Siggia, Eric D
author_facet Mwangi, Michael M
Siggia, Eric D
author_sort Mwangi, Michael M
collection PubMed
description BACKGROUND: To explain the vastly different phenotypes exhibited by the same organism under different conditions, it is essential that we understand how the organism's genes are coordinately regulated. While there are many excellent tools for predicting sequences encoding proteins or RNA genes, few algorithms exist to predict regulatory sequences on a genome wide scale with no prior information. RESULTS: To identify motifs involved in the control of transcription, an algorithm was developed that searches upstream of operons for improbably frequent dimers. The algorithm was applied to the B. subtilis genome, which is predicted to encode for approximately 200 DNA binding proteins. The dimers found to be over-represented could be clustered into 317 distinct groups, each thought to represent a class of motifs uniquely recognized by some transcription factor. For each cluster of dimers, a representative weight matrix was derived and scored over the regions upstream of the operons to predict the sites recognized by the cluster's factor, and a putative regulon of the operons immediately downstream of the sites was inferred. The distribution in number of operons per predicted regulon is comparable to that for well characterized transcription factors. The most highly over-represented dimers matched σ(A), the T-box, and σ(W )sites. We have evidence to suggest that at least 52 of our clusters of dimers represent actual regulatory motifs, based on the groups' weight matrix matches to experimentally characterized sites, the functional similarity of the component operons of the groups' regulons, and the positional biases of the weight matrix matches. All predictions are assigned a significance value, and thresholds are set to avoid false positives. Where possible, we examine our false negatives, drawing examples from known regulatory motifs and regulons inferred from RNA expression data. CONCLUSIONS: We have demonstrated that in the case of B. subtilis our algorithm allows for the genome wide identification of regulatory sites. As well as recovering known sites, we predict new sites of yet uncharacterized factors. Results can be viewed at .
format Text
id pubmed-165661
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1656612003-07-19 Genome wide identification of regulatory motifs in Bacillus subtilis Mwangi, Michael M Siggia, Eric D BMC Bioinformatics Research Article BACKGROUND: To explain the vastly different phenotypes exhibited by the same organism under different conditions, it is essential that we understand how the organism's genes are coordinately regulated. While there are many excellent tools for predicting sequences encoding proteins or RNA genes, few algorithms exist to predict regulatory sequences on a genome wide scale with no prior information. RESULTS: To identify motifs involved in the control of transcription, an algorithm was developed that searches upstream of operons for improbably frequent dimers. The algorithm was applied to the B. subtilis genome, which is predicted to encode for approximately 200 DNA binding proteins. The dimers found to be over-represented could be clustered into 317 distinct groups, each thought to represent a class of motifs uniquely recognized by some transcription factor. For each cluster of dimers, a representative weight matrix was derived and scored over the regions upstream of the operons to predict the sites recognized by the cluster's factor, and a putative regulon of the operons immediately downstream of the sites was inferred. The distribution in number of operons per predicted regulon is comparable to that for well characterized transcription factors. The most highly over-represented dimers matched σ(A), the T-box, and σ(W )sites. We have evidence to suggest that at least 52 of our clusters of dimers represent actual regulatory motifs, based on the groups' weight matrix matches to experimentally characterized sites, the functional similarity of the component operons of the groups' regulons, and the positional biases of the weight matrix matches. All predictions are assigned a significance value, and thresholds are set to avoid false positives. Where possible, we examine our false negatives, drawing examples from known regulatory motifs and regulons inferred from RNA expression data. CONCLUSIONS: We have demonstrated that in the case of B. subtilis our algorithm allows for the genome wide identification of regulatory sites. As well as recovering known sites, we predict new sites of yet uncharacterized factors. Results can be viewed at . BioMed Central 2003-05-16 /pmc/articles/PMC165661/ /pubmed/12749771 http://dx.doi.org/10.1186/1471-2105-4-18 Text en Copyright © 2003 Mwangi and Siggia; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Mwangi, Michael M
Siggia, Eric D
Genome wide identification of regulatory motifs in Bacillus subtilis
title Genome wide identification of regulatory motifs in Bacillus subtilis
title_full Genome wide identification of regulatory motifs in Bacillus subtilis
title_fullStr Genome wide identification of regulatory motifs in Bacillus subtilis
title_full_unstemmed Genome wide identification of regulatory motifs in Bacillus subtilis
title_short Genome wide identification of regulatory motifs in Bacillus subtilis
title_sort genome wide identification of regulatory motifs in bacillus subtilis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC165661/
https://www.ncbi.nlm.nih.gov/pubmed/12749771
http://dx.doi.org/10.1186/1471-2105-4-18
work_keys_str_mv AT mwangimichaelm genomewideidentificationofregulatorymotifsinbacillussubtilis
AT siggiaericd genomewideidentificationofregulatorymotifsinbacillussubtilis