Cargando…

Disentangling transcription factor binding site complexity

The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities...

Descripción completa

Detalles Bibliográficos
Autor principal: Eggeling, Ralf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237759/
https://www.ncbi.nlm.nih.gov/pubmed/30085218
http://dx.doi.org/10.1093/nar/gky683
_version_ 1783371233618821120
author Eggeling, Ralf
author_facet Eggeling, Ralf
author_sort Eggeling, Ralf
collection PubMed
description The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed.
format Online
Article
Text
id pubmed-6237759
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-62377592018-11-21 Disentangling transcription factor binding site complexity Eggeling, Ralf Nucleic Acids Res Methods Online The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed. Oxford University Press 2018-11-16 2018-08-01 /pmc/articles/PMC6237759/ /pubmed/30085218 http://dx.doi.org/10.1093/nar/gky683 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Eggeling, Ralf
Disentangling transcription factor binding site complexity
title Disentangling transcription factor binding site complexity
title_full Disentangling transcription factor binding site complexity
title_fullStr Disentangling transcription factor binding site complexity
title_full_unstemmed Disentangling transcription factor binding site complexity
title_short Disentangling transcription factor binding site complexity
title_sort disentangling transcription factor binding site complexity
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237759/
https://www.ncbi.nlm.nih.gov/pubmed/30085218
http://dx.doi.org/10.1093/nar/gky683
work_keys_str_mv AT eggelingralf disentanglingtranscriptionfactorbindingsitecomplexity