Cargando…

dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has been successfully used for genome-wide profiling of transcription factor binding sites, histone modifications, and nucleosome occupancy in many model organisms and humans. Because the compact genomes of prokaryotes h...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Dongjun, Park, Dan, Myers, Kevin, Grass, Jeffrey, Kiley, Patricia, Landick, Robert, Keleş, Sündüz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3798280/
https://www.ncbi.nlm.nih.gov/pubmed/24146601
http://dx.doi.org/10.1371/journal.pcbi.1003246
_version_ 1782287749036900352
author Chung, Dongjun
Park, Dan
Myers, Kevin
Grass, Jeffrey
Kiley, Patricia
Landick, Robert
Keleş, Sündüz
author_facet Chung, Dongjun
Park, Dan
Myers, Kevin
Grass, Jeffrey
Kiley, Patricia
Landick, Robert
Keleş, Sündüz
author_sort Chung, Dongjun
collection PubMed
description Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has been successfully used for genome-wide profiling of transcription factor binding sites, histone modifications, and nucleosome occupancy in many model organisms and humans. Because the compact genomes of prokaryotes harbor many binding sites separated by only few base pairs, applications of ChIP-Seq in this domain have not reached their full potential. Applications in prokaryotic genomes are further hampered by the fact that well studied data analysis methods for ChIP-Seq do not result in a resolution required for deciphering the locations of nearby binding events. We generated single-end tag (SET) and paired-end tag (PET) ChIP-Seq data for [Image: see text] factor in Escherichia coli (E. coli). Direct comparison of these datasets revealed that although PET assay enables higher resolution identification of binding events, standard ChIP-Seq analysis methods are not equipped to utilize PET-specific features of the data. To address this problem, we developed dPeak as a high resolution binding site identification (deconvolution) algorithm. dPeak implements a probabilistic model that accurately describes ChIP-Seq data generation process for both the SET and PET assays. For SET data, dPeak outperforms or performs comparably to the state-of-the-art high-resolution ChIP-Seq peak deconvolution algorithms such as PICS, GPS, and GEM. When coupled with PET data, dPeak significantly outperforms SET-based analysis with any of the current state-of-the-art methods. Experimental validations of a subset of dPeak predictions from [Image: see text] PET ChIP-Seq data indicate that dPeak can estimate locations of binding events with as high as [Image: see text] to [Image: see text] resolution. Applications of dPeak to [Image: see text] ChIP-Seq data in E. coli under aerobic and anaerobic conditions reveal closely located promoters that are differentially occupied and further illustrate the importance of high resolution analysis of ChIP-Seq data.
format Online
Article
Text
id pubmed-3798280
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37982802013-10-21 dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data Chung, Dongjun Park, Dan Myers, Kevin Grass, Jeffrey Kiley, Patricia Landick, Robert Keleş, Sündüz PLoS Comput Biol Research Article Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has been successfully used for genome-wide profiling of transcription factor binding sites, histone modifications, and nucleosome occupancy in many model organisms and humans. Because the compact genomes of prokaryotes harbor many binding sites separated by only few base pairs, applications of ChIP-Seq in this domain have not reached their full potential. Applications in prokaryotic genomes are further hampered by the fact that well studied data analysis methods for ChIP-Seq do not result in a resolution required for deciphering the locations of nearby binding events. We generated single-end tag (SET) and paired-end tag (PET) ChIP-Seq data for [Image: see text] factor in Escherichia coli (E. coli). Direct comparison of these datasets revealed that although PET assay enables higher resolution identification of binding events, standard ChIP-Seq analysis methods are not equipped to utilize PET-specific features of the data. To address this problem, we developed dPeak as a high resolution binding site identification (deconvolution) algorithm. dPeak implements a probabilistic model that accurately describes ChIP-Seq data generation process for both the SET and PET assays. For SET data, dPeak outperforms or performs comparably to the state-of-the-art high-resolution ChIP-Seq peak deconvolution algorithms such as PICS, GPS, and GEM. When coupled with PET data, dPeak significantly outperforms SET-based analysis with any of the current state-of-the-art methods. Experimental validations of a subset of dPeak predictions from [Image: see text] PET ChIP-Seq data indicate that dPeak can estimate locations of binding events with as high as [Image: see text] to [Image: see text] resolution. Applications of dPeak to [Image: see text] ChIP-Seq data in E. coli under aerobic and anaerobic conditions reveal closely located promoters that are differentially occupied and further illustrate the importance of high resolution analysis of ChIP-Seq data. Public Library of Science 2013-10-17 /pmc/articles/PMC3798280/ /pubmed/24146601 http://dx.doi.org/10.1371/journal.pcbi.1003246 Text en © 2013 Chung et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chung, Dongjun
Park, Dan
Myers, Kevin
Grass, Jeffrey
Kiley, Patricia
Landick, Robert
Keleş, Sündüz
dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
title dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
title_full dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
title_fullStr dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
title_full_unstemmed dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
title_short dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data
title_sort dpeak: high resolution identification of transcription factor binding sites from pet and set chip-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3798280/
https://www.ncbi.nlm.nih.gov/pubmed/24146601
http://dx.doi.org/10.1371/journal.pcbi.1003246
work_keys_str_mv AT chungdongjun dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata
AT parkdan dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata
AT myerskevin dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata
AT grassjeffrey dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata
AT kileypatricia dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata
AT landickrobert dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata
AT kelessunduz dpeakhighresolutionidentificationoftranscriptionfactorbindingsitesfrompetandsetchipseqdata