Cargando…
A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057186/ https://www.ncbi.nlm.nih.gov/pubmed/24926895 http://dx.doi.org/10.1371/journal.pone.0099015 |
_version_ | 1782320914262654976 |
---|---|
author | Santolini, Marc Mora, Thierry Hakim, Vincent |
author_facet | Santolini, Marc Mora, Thierry Hakim, Vincent |
author_sort | Santolini, Marc |
collection | PubMed |
description | The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs. |
format | Online Article Text |
id | pubmed-4057186 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-40571862014-06-18 A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites Santolini, Marc Mora, Thierry Hakim, Vincent PLoS One Research Article The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs. Public Library of Science 2014-06-13 /pmc/articles/PMC4057186/ /pubmed/24926895 http://dx.doi.org/10.1371/journal.pone.0099015 Text en © 2014 Santolini et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Santolini, Marc Mora, Thierry Hakim, Vincent A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites |
title | A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites |
title_full | A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites |
title_fullStr | A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites |
title_full_unstemmed | A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites |
title_short | A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites |
title_sort | general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057186/ https://www.ncbi.nlm.nih.gov/pubmed/24926895 http://dx.doi.org/10.1371/journal.pone.0099015 |
work_keys_str_mv | AT santolinimarc ageneralpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites AT morathierry ageneralpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites AT hakimvincent ageneralpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites AT santolinimarc generalpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites AT morathierry generalpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites AT hakimvincent generalpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites |