Cargando…

T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets

BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genom...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yuanyuan, Umbach, David M, Li, Leping
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903014/ https://www.ncbi.nlm.nih.gov/pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27

_version_	1782301058168520704
author	Li, Yuanyuan Umbach, David M Li, Leping
author_facet	Li, Yuanyuan Umbach, David M Li, Leping
author_sort	Li, Yuanyuan
collection	PubMed
description	BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic “hot spots” where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein.
format	Online Article Text
id	pubmed-3903014
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39030142014-02-11 T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets Li, Yuanyuan Umbach, David M Li, Leping BMC Genomics Methodology Article BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic “hot spots” where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein. BioMed Central 2014-01-15 /pmc/articles/PMC3903014/ /pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27 Text en Copyright © 2014 Li et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Li, Yuanyuan Umbach, David M Li, Leping T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title	T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_full	T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_fullStr	T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_full_unstemmed	T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_short	T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_sort	t-kde: a method for genome-wide identification of constitutive protein binding sites from multiple chip-seq data sets
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903014/ https://www.ncbi.nlm.nih.gov/pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27
work_keys_str_mv	AT liyuanyuan tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets AT umbachdavidm tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets AT lileping tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets

T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets

Ejemplares similares