Cargando…

Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to t...

Descripción completa

Detalles Bibliográficos
Autores principales: Omidi, Saeed, Zavolan, Mihaela, Pachkov, Mikhail, Breda, Jeremie, Berger, Severin, van Nimwegen, Erik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5550003/
https://www.ncbi.nlm.nih.gov/pubmed/28753602
http://dx.doi.org/10.1371/journal.pcbi.1005176
_version_ 1783256062736990208
author Omidi, Saeed
Zavolan, Mihaela
Pachkov, Mikhail
Breda, Jeremie
Berger, Severin
van Nimwegen, Erik
author_facet Omidi, Saeed
Zavolan, Mihaela
Pachkov, Mikhail
Breda, Jeremie
Berger, Severin
van Nimwegen, Erik
author_sort Omidi, Saeed
collection PubMed
description Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform ‘motif finding’, i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT ‘dilogo’ motifs.
format Online
Article
Text
id pubmed-5550003
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55500032017-08-15 Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors Omidi, Saeed Zavolan, Mihaela Pachkov, Mikhail Breda, Jeremie Berger, Severin van Nimwegen, Erik PLoS Comput Biol Research Article Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform ‘motif finding’, i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT ‘dilogo’ motifs. Public Library of Science 2017-07-28 /pmc/articles/PMC5550003/ /pubmed/28753602 http://dx.doi.org/10.1371/journal.pcbi.1005176 Text en © 2017 Omidi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Omidi, Saeed
Zavolan, Mihaela
Pachkov, Mikhail
Breda, Jeremie
Berger, Severin
van Nimwegen, Erik
Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
title Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
title_full Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
title_fullStr Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
title_full_unstemmed Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
title_short Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
title_sort automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5550003/
https://www.ncbi.nlm.nih.gov/pubmed/28753602
http://dx.doi.org/10.1371/journal.pcbi.1005176
work_keys_str_mv AT omidisaeed automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT zavolanmihaela automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT pachkovmikhail automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT bredajeremie automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT bergerseverin automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors
AT vannimwegenerik automatedincorporationofpairwisedependencyintranscriptionfactorbindingsitepredictionusingdinucleotideweighttensors