Cargando…

The Next Generation of Transcription Factor Binding Site Prediction

Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based o...

Descripción completa

Detalles Bibliográficos
Autores principales: Mathelier, Anthony, Wasserman, Wyeth W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3764009/
https://www.ncbi.nlm.nih.gov/pubmed/24039567
http://dx.doi.org/10.1371/journal.pcbi.1003214
_version_ 1782283072571441152
author Mathelier, Anthony
Wasserman, Wyeth W.
author_facet Mathelier, Anthony
Wasserman, Wyeth W.
author_sort Mathelier, Anthony
collection PubMed
description Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
format Online
Article
Text
id pubmed-3764009
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37640092013-09-13 The Next Generation of Transcription Factor Binding Site Prediction Mathelier, Anthony Wasserman, Wyeth W. PLoS Comput Biol Research Article Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction. Public Library of Science 2013-09-05 /pmc/articles/PMC3764009/ /pubmed/24039567 http://dx.doi.org/10.1371/journal.pcbi.1003214 Text en © 2013 Mathelier, Wasserman http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mathelier, Anthony
Wasserman, Wyeth W.
The Next Generation of Transcription Factor Binding Site Prediction
title The Next Generation of Transcription Factor Binding Site Prediction
title_full The Next Generation of Transcription Factor Binding Site Prediction
title_fullStr The Next Generation of Transcription Factor Binding Site Prediction
title_full_unstemmed The Next Generation of Transcription Factor Binding Site Prediction
title_short The Next Generation of Transcription Factor Binding Site Prediction
title_sort next generation of transcription factor binding site prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3764009/
https://www.ncbi.nlm.nih.gov/pubmed/24039567
http://dx.doi.org/10.1371/journal.pcbi.1003214
work_keys_str_mv AT mathelieranthony thenextgenerationoftranscriptionfactorbindingsiteprediction
AT wassermanwyethw thenextgenerationoftranscriptionfactorbindingsiteprediction
AT mathelieranthony nextgenerationoftranscriptionfactorbindingsiteprediction
AT wassermanwyethw nextgenerationoftranscriptionfactorbindingsiteprediction