Cargando…

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site...

Descripción completa

Detalles Bibliográficos
Autores principales:	Agius, Phaedra, Arvey, Aaron, Chang, William, Noble, William Stafford, Leslie, Christina
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2936517/ https://www.ncbi.nlm.nih.gov/pubmed/20838582 http://dx.doi.org/10.1371/journal.pcbi.1000916

_version_	1782186499844866048
author	Agius, Phaedra Arvey, Aaron Chang, William Noble, William Stafford Leslie, Christina
author_facet	Agius, Phaedra Arvey, Aaron Chang, William Noble, William Stafford Leslie, Christina
author_sort	Agius, Phaedra
collection	PubMed
description	Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel [Image: see text]-mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding.
format	Text
id	pubmed-2936517
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-29365172010-09-13 High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions Agius, Phaedra Arvey, Aaron Chang, William Noble, William Stafford Leslie, Christina PLoS Comput Biol Research Article Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large numbers of sites and produce an unreliable list of target genes. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, so that individual sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned a list of thousands of scored sequence patterns. Meanwhile, high-resolution in vivo TF occupancy data from ChIP-seq experiments is also increasingly available. We have developed a flexible discriminative framework for learning TF binding preferences from high resolution in vitro and in vivo data. We first trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. We used a novel [Image: see text]-mer based string kernel called the di-mismatch kernel to represent probe sequence similarities. The SVR models are more compact than E-scores, more expressive than PSSMs, and can be readily used to scan genomics regions to predict in vivo occupancy. Using a large data set of yeast and mouse TFs, we found that our SVR models can better predict probe intensity than the E-score method or PBM-derived PSSMs. Moreover, by using SVRs to score yeast, mouse, and human genomic regions, we were better able to predict genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we found that by training kernel-based models directly on ChIP-seq data, we greatly improved in vivo occupancy prediction, and by comparing a TF's in vitro and in vivo models, we could identify cofactors and disambiguate direct and indirect binding. Public Library of Science 2010-09-09 /pmc/articles/PMC2936517/ /pubmed/20838582 http://dx.doi.org/10.1371/journal.pcbi.1000916 Text en Agius et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Agius, Phaedra Arvey, Aaron Chang, William Noble, William Stafford Leslie, Christina High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
title	High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
title_full	High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
title_fullStr	High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
title_full_unstemmed	High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
title_short	High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
title_sort	high resolution models of transcription factor-dna affinities improve in vitro and in vivo binding predictions
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2936517/ https://www.ncbi.nlm.nih.gov/pubmed/20838582 http://dx.doi.org/10.1371/journal.pcbi.1000916
work_keys_str_mv	AT agiusphaedra highresolutionmodelsoftranscriptionfactordnaaffinitiesimproveinvitroandinvivobindingpredictions AT arveyaaron highresolutionmodelsoftranscriptionfactordnaaffinitiesimproveinvitroandinvivobindingpredictions AT changwilliam highresolutionmodelsoftranscriptionfactordnaaffinitiesimproveinvitroandinvivobindingpredictions AT noblewilliamstafford highresolutionmodelsoftranscriptionfactordnaaffinitiesimproveinvitroandinvivobindingpredictions AT lesliechristina highresolutionmodelsoftranscriptionfactordnaaffinitiesimproveinvitroandinvivobindingpredictions

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

Ejemplares similares