Cargando…

Saturation analysis of ChIP-seq data for reproducible identification of binding peaks

Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein–DNA interactions based on variou...

Descripción completa

Detalles Bibliográficos
Autores principales: Hansen, Peter, Hecht, Jochen, Ibrahim, Daniel M., Krannich, Alexander, Truss, Matthias, Robinson, Peter N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4561497/
https://www.ncbi.nlm.nih.gov/pubmed/26163319
http://dx.doi.org/10.1101/gr.189894.115
_version_ 1782389047464820736
author Hansen, Peter
Hecht, Jochen
Ibrahim, Daniel M.
Krannich, Alexander
Truss, Matthias
Robinson, Peter N.
author_facet Hansen, Peter
Hecht, Jochen
Ibrahim, Daniel M.
Krannich, Alexander
Truss, Matthias
Robinson, Peter N.
author_sort Hansen, Peter
collection PubMed
description Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein–DNA interactions based on various measures of enrichment of sequence reads. In this work, we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5′ ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We show that Q has superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C+l+ and is freely available under an open source license.
format Online
Article
Text
id pubmed-4561497
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-45614972016-03-01 Saturation analysis of ChIP-seq data for reproducible identification of binding peaks Hansen, Peter Hecht, Jochen Ibrahim, Daniel M. Krannich, Alexander Truss, Matthias Robinson, Peter N. Genome Res Method Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein–DNA interactions based on various measures of enrichment of sequence reads. In this work, we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5′ ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We show that Q has superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C+l+ and is freely available under an open source license. Cold Spring Harbor Laboratory Press 2015-09 /pmc/articles/PMC4561497/ /pubmed/26163319 http://dx.doi.org/10.1101/gr.189894.115 Text en © 2015 Hansen et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Hansen, Peter
Hecht, Jochen
Ibrahim, Daniel M.
Krannich, Alexander
Truss, Matthias
Robinson, Peter N.
Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
title Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
title_full Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
title_fullStr Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
title_full_unstemmed Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
title_short Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
title_sort saturation analysis of chip-seq data for reproducible identification of binding peaks
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4561497/
https://www.ncbi.nlm.nih.gov/pubmed/26163319
http://dx.doi.org/10.1101/gr.189894.115
work_keys_str_mv AT hansenpeter saturationanalysisofchipseqdataforreproducibleidentificationofbindingpeaks
AT hechtjochen saturationanalysisofchipseqdataforreproducibleidentificationofbindingpeaks
AT ibrahimdanielm saturationanalysisofchipseqdataforreproducibleidentificationofbindingpeaks
AT krannichalexander saturationanalysisofchipseqdataforreproducibleidentificationofbindingpeaks
AT trussmatthias saturationanalysisofchipseqdataforreproducibleidentificationofbindingpeaks
AT robinsonpetern saturationanalysisofchipseqdataforreproducibleidentificationofbindingpeaks