Cargando…

T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets

BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genom...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yuanyuan, Umbach, David M, Li, Leping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903014/
https://www.ncbi.nlm.nih.gov/pubmed/24428924
http://dx.doi.org/10.1186/1471-2164-15-27
_version_ 1782301058168520704
author Li, Yuanyuan
Umbach, David M
Li, Leping
author_facet Li, Yuanyuan
Umbach, David M
Li, Leping
author_sort Li, Yuanyuan
collection PubMed
description BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic “hot spots” where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein.
format Online
Article
Text
id pubmed-3903014
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39030142014-02-11 T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets Li, Yuanyuan Umbach, David M Li, Leping BMC Genomics Methodology Article BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic “hot spots” where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein. BioMed Central 2014-01-15 /pmc/articles/PMC3903014/ /pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27 Text en Copyright © 2014 Li et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Li, Yuanyuan
Umbach, David M
Li, Leping
T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_full T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_fullStr T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_full_unstemmed T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_short T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
title_sort t-kde: a method for genome-wide identification of constitutive protein binding sites from multiple chip-seq data sets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903014/
https://www.ncbi.nlm.nih.gov/pubmed/24428924
http://dx.doi.org/10.1186/1471-2164-15-27
work_keys_str_mv AT liyuanyuan tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets
AT umbachdavidm tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets
AT lileping tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets