Cargando…

OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling

ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-cont...

Descripción completa

Detalles Bibliográficos
Autores principales: de Boer, Bouke A., van Duijvenboden, Karel, van den Boogaard, Malou, Christoffels, Vincent M., Barnett, Phil, Ruijter, Jan M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061025/
https://www.ncbi.nlm.nih.gov/pubmed/24936875
http://dx.doi.org/10.1371/journal.pone.0099844
_version_ 1782321434400391168
author de Boer, Bouke A.
van Duijvenboden, Karel
van den Boogaard, Malou
Christoffels, Vincent M.
Barnett, Phil
Ruijter, Jan M.
author_facet de Boer, Bouke A.
van Duijvenboden, Karel
van den Boogaard, Malou
Christoffels, Vincent M.
Barnett, Phil
Ruijter, Jan M.
author_sort de Boer, Bouke A.
collection PubMed
description ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly. Availability: http://occupeak.hfrc.nl
format Online
Article
Text
id pubmed-4061025
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40610252014-06-20 OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling de Boer, Bouke A. van Duijvenboden, Karel van den Boogaard, Malou Christoffels, Vincent M. Barnett, Phil Ruijter, Jan M. PLoS One Research Article ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly. Availability: http://occupeak.hfrc.nl Public Library of Science 2014-06-17 /pmc/articles/PMC4061025/ /pubmed/24936875 http://dx.doi.org/10.1371/journal.pone.0099844 Text en © 2014 de Boer et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
de Boer, Bouke A.
van Duijvenboden, Karel
van den Boogaard, Malou
Christoffels, Vincent M.
Barnett, Phil
Ruijter, Jan M.
OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling
title OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling
title_full OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling
title_fullStr OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling
title_full_unstemmed OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling
title_short OccuPeak: ChIP-Seq Peak Calling Based on Internal Background Modelling
title_sort occupeak: chip-seq peak calling based on internal background modelling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4061025/
https://www.ncbi.nlm.nih.gov/pubmed/24936875
http://dx.doi.org/10.1371/journal.pone.0099844
work_keys_str_mv AT deboerboukea occupeakchipseqpeakcallingbasedoninternalbackgroundmodelling
AT vanduijvenbodenkarel occupeakchipseqpeakcallingbasedoninternalbackgroundmodelling
AT vandenboogaardmalou occupeakchipseqpeakcallingbasedoninternalbackgroundmodelling
AT christoffelsvincentm occupeakchipseqpeakcallingbasedoninternalbackgroundmodelling
AT barnettphil occupeakchipseqpeakcallingbasedoninternalbackgroundmodelling
AT ruijterjanm occupeakchipseqpeakcallingbasedoninternalbackgroundmodelling