Cargando…

OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif

BACKGROUND: DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide...

Descripción completa

Detalles Bibliográficos
Autores principales: Drawid, Amar, Gupta, Nupur, Nagaraj, Vijayalakshmi H, Gélinas, Céline, Sengupta, Anirvan M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718928/
https://www.ncbi.nlm.nih.gov/pubmed/19583839
http://dx.doi.org/10.1186/1471-2105-10-208
_version_ 1782170039565156352
author Drawid, Amar
Gupta, Nupur
Nagaraj, Vijayalakshmi H
Gélinas, Céline
Sengupta, Anirvan M
author_facet Drawid, Amar
Gupta, Nupur
Nagaraj, Vijayalakshmi H
Gélinas, Céline
Sengupta, Anirvan M
author_sort Drawid, Amar
collection PubMed
description BACKGROUND: DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results. RESULTS: We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: Occupancy via Hidden Markov Model. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-κB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-κB binding sites predicted by our method are likely to be functional. CONCLUSION: Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-κB function and regulation and possible new biological roles of NF-κB were uncovered.
format Text
id pubmed-2718928
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27189282009-07-31 OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif Drawid, Amar Gupta, Nupur Nagaraj, Vijayalakshmi H Gélinas, Céline Sengupta, Anirvan M BMC Bioinformatics Research Article BACKGROUND: DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results. RESULTS: We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: Occupancy via Hidden Markov Model. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-κB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-κB binding sites predicted by our method are likely to be functional. CONCLUSION: Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-κB function and regulation and possible new biological roles of NF-κB were uncovered. BioMed Central 2009-07-07 /pmc/articles/PMC2718928/ /pubmed/19583839 http://dx.doi.org/10.1186/1471-2105-10-208 Text en Copyright © 2009 Drawid et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Drawid, Amar
Gupta, Nupur
Nagaraj, Vijayalakshmi H
Gélinas, Céline
Sengupta, Anirvan M
OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
title OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
title_full OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
title_fullStr OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
title_full_unstemmed OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
title_short OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
title_sort ohmm: a hidden markov model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718928/
https://www.ncbi.nlm.nih.gov/pubmed/19583839
http://dx.doi.org/10.1186/1471-2105-10-208
work_keys_str_mv AT drawidamar ohmmahiddenmarkovmodelaccuratelypredictingtheoccupancyofatranscriptionfactorwithaselfoverlappingbindingmotif
AT guptanupur ohmmahiddenmarkovmodelaccuratelypredictingtheoccupancyofatranscriptionfactorwithaselfoverlappingbindingmotif
AT nagarajvijayalakshmih ohmmahiddenmarkovmodelaccuratelypredictingtheoccupancyofatranscriptionfactorwithaselfoverlappingbindingmotif
AT gelinasceline ohmmahiddenmarkovmodelaccuratelypredictingtheoccupancyofatranscriptionfactorwithaselfoverlappingbindingmotif
AT senguptaanirvanm ohmmahiddenmarkovmodelaccuratelypredictingtheoccupancyofatranscriptionfactorwithaselfoverlappingbindingmotif