Cargando…

A Feature-Based Approach to Modeling Protein–DNA Interactions

Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplify...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharon, Eilon, Lubliner, Shai, Segal, Eran
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2516605/
https://www.ncbi.nlm.nih.gov/pubmed/18725950
http://dx.doi.org/10.1371/journal.pcbi.1000154
_version_ 1782158497733935104
author Sharon, Eilon
Lubliner, Shai
Segal, Eran
author_facet Sharon, Eilon
Lubliner, Shai
Segal, Eran
author_sort Sharon, Eilon
collection PubMed
description Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/.
format Text
id pubmed-2516605
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-25166052008-08-22 A Feature-Based Approach to Modeling Protein–DNA Interactions Sharon, Eilon Lubliner, Shai Segal, Eran PLoS Comput Biol Research Article Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/. Public Library of Science 2008-08-22 /pmc/articles/PMC2516605/ /pubmed/18725950 http://dx.doi.org/10.1371/journal.pcbi.1000154 Text en Sharon et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Sharon, Eilon
Lubliner, Shai
Segal, Eran
A Feature-Based Approach to Modeling Protein–DNA Interactions
title A Feature-Based Approach to Modeling Protein–DNA Interactions
title_full A Feature-Based Approach to Modeling Protein–DNA Interactions
title_fullStr A Feature-Based Approach to Modeling Protein–DNA Interactions
title_full_unstemmed A Feature-Based Approach to Modeling Protein–DNA Interactions
title_short A Feature-Based Approach to Modeling Protein–DNA Interactions
title_sort feature-based approach to modeling protein–dna interactions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2516605/
https://www.ncbi.nlm.nih.gov/pubmed/18725950
http://dx.doi.org/10.1371/journal.pcbi.1000154
work_keys_str_mv AT sharoneilon afeaturebasedapproachtomodelingproteindnainteractions
AT lublinershai afeaturebasedapproachtomodelingproteindnainteractions
AT segaleran afeaturebasedapproachtomodelingproteindnainteractions
AT sharoneilon featurebasedapproachtomodelingproteindnainteractions
AT lublinershai featurebasedapproachtomodelingproteindnainteractions
AT segaleran featurebasedapproachtomodelingproteindnainteractions