Cargando…

A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites

The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently...

Descripción completa

Detalles Bibliográficos
Autores principales: Santolini, Marc, Mora, Thierry, Hakim, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057186/
https://www.ncbi.nlm.nih.gov/pubmed/24926895
http://dx.doi.org/10.1371/journal.pone.0099015
_version_ 1782320914262654976
author Santolini, Marc
Mora, Thierry
Hakim, Vincent
author_facet Santolini, Marc
Mora, Thierry
Hakim, Vincent
author_sort Santolini, Marc
collection PubMed
description The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs.
format Online
Article
Text
id pubmed-4057186
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40571862014-06-18 A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites Santolini, Marc Mora, Thierry Hakim, Vincent PLoS One Research Article The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs. Public Library of Science 2014-06-13 /pmc/articles/PMC4057186/ /pubmed/24926895 http://dx.doi.org/10.1371/journal.pone.0099015 Text en © 2014 Santolini et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Santolini, Marc
Mora, Thierry
Hakim, Vincent
A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
title A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
title_full A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
title_fullStr A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
title_full_unstemmed A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
title_short A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites
title_sort general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057186/
https://www.ncbi.nlm.nih.gov/pubmed/24926895
http://dx.doi.org/10.1371/journal.pone.0099015
work_keys_str_mv AT santolinimarc ageneralpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites
AT morathierry ageneralpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites
AT hakimvincent ageneralpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites
AT santolinimarc generalpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites
AT morathierry generalpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites
AT hakimvincent generalpairwiseinteractionmodelprovidesanaccuratedescriptionofinvivotranscriptionfactorbindingsites