Cargando…
An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters
A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of repres...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5081247/ https://www.ncbi.nlm.nih.gov/pubmed/27812284 http://dx.doi.org/10.4137/BBI.S29330 |
_version_ | 1782462855016087552 |
---|---|
author | Ramsey, Stephen A. |
author_facet | Ramsey, Stephen A. |
author_sort | Ramsey, Stephen A. |
collection | PubMed |
description | A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5′ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis–Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences. |
format | Online Article Text |
id | pubmed-5081247 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-50812472016-11-03 An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters Ramsey, Stephen A. Bioinform Biol Insights Methodology A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5′ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis–Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences. Libertas Academica 2016-10-25 /pmc/articles/PMC5081247/ /pubmed/27812284 http://dx.doi.org/10.4137/BBI.S29330 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License. |
spellingShingle | Methodology Ramsey, Stephen A. An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters |
title | An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters |
title_full | An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters |
title_fullStr | An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters |
title_full_unstemmed | An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters |
title_short | An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters |
title_sort | empirical prior improves accuracy for bayesian estimation of transcription factor binding site frequencies within gene promoters |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5081247/ https://www.ncbi.nlm.nih.gov/pubmed/27812284 http://dx.doi.org/10.4137/BBI.S29330 |
work_keys_str_mv | AT ramseystephena anempiricalpriorimprovesaccuracyforbayesianestimationoftranscriptionfactorbindingsitefrequencieswithingenepromoters AT ramseystephena empiricalpriorimprovesaccuracyforbayesianestimationoftranscriptionfactorbindingsitefrequencieswithingenepromoters |