Cargando…
An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters
A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of repres...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5081247/ https://www.ncbi.nlm.nih.gov/pubmed/27812284 http://dx.doi.org/10.4137/BBI.S29330 |
Sumario: | A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5′ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis–Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences. |
---|