Cargando…

On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF

The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algo...

Descripción completa

Detalles Bibliográficos
Autores principales: Eggeling, Ralf, Gohr, André, Keilwagen, Jens, Mohr, Michaela, Posch, Stefan, Smith, Andrew D., Grosse, Ivo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3899044/
https://www.ncbi.nlm.nih.gov/pubmed/24465627
http://dx.doi.org/10.1371/journal.pone.0085629
_version_ 1782300511520686080
author Eggeling, Ralf
Gohr, André
Keilwagen, Jens
Mohr, Michaela
Posch, Stefan
Smith, Andrew D.
Grosse, Ivo
author_facet Eggeling, Ralf
Gohr, André
Keilwagen, Jens
Mohr, Michaela
Posch, Stefan
Smith, Andrew D.
Grosse, Ivo
author_sort Eggeling, Ralf
collection PubMed
description The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3′ end.
format Online
Article
Text
id pubmed-3899044
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38990442014-01-24 On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF Eggeling, Ralf Gohr, André Keilwagen, Jens Mohr, Michaela Posch, Stefan Smith, Andrew D. Grosse, Ivo PLoS One Research Article The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3′ end. Public Library of Science 2014-01-22 /pmc/articles/PMC3899044/ /pubmed/24465627 http://dx.doi.org/10.1371/journal.pone.0085629 Text en © 2014 Eggeling et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Eggeling, Ralf
Gohr, André
Keilwagen, Jens
Mohr, Michaela
Posch, Stefan
Smith, Andrew D.
Grosse, Ivo
On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF
title On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF
title_full On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF
title_fullStr On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF
title_full_unstemmed On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF
title_short On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF
title_sort on the value of intra-motif dependencies of human insulator protein ctcf
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3899044/
https://www.ncbi.nlm.nih.gov/pubmed/24465627
http://dx.doi.org/10.1371/journal.pone.0085629
work_keys_str_mv AT eggelingralf onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT gohrandre onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT keilwagenjens onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT mohrmichaela onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT poschstefan onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT smithandrewd onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT grosseivo onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf