Cargando…

A Bayesian Search for Transcriptional Motifs

Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Andrew K., Print, Cristin G., Nielsen, Poul M. F., Crampin, Edmund J.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987817/
https://www.ncbi.nlm.nih.gov/pubmed/21124986
http://dx.doi.org/10.1371/journal.pone.0013897
_version_ 1782192160700891136
author Miller, Andrew K.
Print, Cristin G.
Nielsen, Poul M. F.
Crampin, Edmund J.
author_facet Miller, Andrew K.
Print, Cristin G.
Nielsen, Poul M. F.
Crampin, Edmund J.
author_sort Miller, Andrew K.
collection PubMed
description Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools.
format Text
id pubmed-2987817
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29878172010-12-01 A Bayesian Search for Transcriptional Motifs Miller, Andrew K. Print, Cristin G. Nielsen, Poul M. F. Crampin, Edmund J. PLoS One Research Article Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools. Public Library of Science 2010-11-18 /pmc/articles/PMC2987817/ /pubmed/21124986 http://dx.doi.org/10.1371/journal.pone.0013897 Text en Miller et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Miller, Andrew K.
Print, Cristin G.
Nielsen, Poul M. F.
Crampin, Edmund J.
A Bayesian Search for Transcriptional Motifs
title A Bayesian Search for Transcriptional Motifs
title_full A Bayesian Search for Transcriptional Motifs
title_fullStr A Bayesian Search for Transcriptional Motifs
title_full_unstemmed A Bayesian Search for Transcriptional Motifs
title_short A Bayesian Search for Transcriptional Motifs
title_sort bayesian search for transcriptional motifs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987817/
https://www.ncbi.nlm.nih.gov/pubmed/21124986
http://dx.doi.org/10.1371/journal.pone.0013897
work_keys_str_mv AT millerandrewk abayesiansearchfortranscriptionalmotifs
AT printcristing abayesiansearchfortranscriptionalmotifs
AT nielsenpoulmf abayesiansearchfortranscriptionalmotifs
AT crampinedmundj abayesiansearchfortranscriptionalmotifs
AT millerandrewk bayesiansearchfortranscriptionalmotifs
AT printcristing bayesiansearchfortranscriptionalmotifs
AT nielsenpoulmf bayesiansearchfortranscriptionalmotifs
AT crampinedmundj bayesiansearchfortranscriptionalmotifs