Cargando…
T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets
BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genom...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903014/ https://www.ncbi.nlm.nih.gov/pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27 |
_version_ | 1782301058168520704 |
---|---|
author | Li, Yuanyuan Umbach, David M Li, Leping |
author_facet | Li, Yuanyuan Umbach, David M Li, Leping |
author_sort | Li, Yuanyuan |
collection | PubMed |
description | BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic “hot spots” where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein. |
format | Online Article Text |
id | pubmed-3903014 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39030142014-02-11 T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets Li, Yuanyuan Umbach, David M Li, Leping BMC Genomics Methodology Article BACKGROUND: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS: To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS: T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic “hot spots” where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein. BioMed Central 2014-01-15 /pmc/articles/PMC3903014/ /pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27 Text en Copyright © 2014 Li et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Li, Yuanyuan Umbach, David M Li, Leping T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets |
title | T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets |
title_full | T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets |
title_fullStr | T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets |
title_full_unstemmed | T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets |
title_short | T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets |
title_sort | t-kde: a method for genome-wide identification of constitutive protein binding sites from multiple chip-seq data sets |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3903014/ https://www.ncbi.nlm.nih.gov/pubmed/24428924 http://dx.doi.org/10.1186/1471-2164-15-27 |
work_keys_str_mv | AT liyuanyuan tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets AT umbachdavidm tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets AT lileping tkdeamethodforgenomewideidentificationofconstitutiveproteinbindingsitesfrommultiplechipseqdatasets |