Cargando…

RECAP reveals the true statistical significance of ChIP-seq peak calls

MOTIVATION: Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified...

Descripción completa

Detalles Bibliográficos
Autores principales: Chitpin, Justin G, Awdeh, Aseel, Perkins, Theodore J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761936/
https://www.ncbi.nlm.nih.gov/pubmed/30824903
http://dx.doi.org/10.1093/bioinformatics/btz150
_version_ 1783454128047915008
author Chitpin, Justin G
Awdeh, Aseel
Perkins, Theodore J
author_facet Chitpin, Justin G
Awdeh, Aseel
Perkins, Theodore J
author_sort Chitpin, Justin G
collection PubMed
description MOTIVATION: Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown. RESULTS: Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls. AVAILABILITY AND IMPLEMENTATION: The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6761936
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67619362019-10-02 RECAP reveals the true statistical significance of ChIP-seq peak calls Chitpin, Justin G Awdeh, Aseel Perkins, Theodore J Bioinformatics Original Papers MOTIVATION: Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown. RESULTS: Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls. AVAILABILITY AND IMPLEMENTATION: The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-10-01 2019-03-01 /pmc/articles/PMC6761936/ /pubmed/30824903 http://dx.doi.org/10.1093/bioinformatics/btz150 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Chitpin, Justin G
Awdeh, Aseel
Perkins, Theodore J
RECAP reveals the true statistical significance of ChIP-seq peak calls
title RECAP reveals the true statistical significance of ChIP-seq peak calls
title_full RECAP reveals the true statistical significance of ChIP-seq peak calls
title_fullStr RECAP reveals the true statistical significance of ChIP-seq peak calls
title_full_unstemmed RECAP reveals the true statistical significance of ChIP-seq peak calls
title_short RECAP reveals the true statistical significance of ChIP-seq peak calls
title_sort recap reveals the true statistical significance of chip-seq peak calls
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761936/
https://www.ncbi.nlm.nih.gov/pubmed/30824903
http://dx.doi.org/10.1093/bioinformatics/btz150
work_keys_str_mv AT chitpinjusting recaprevealsthetruestatisticalsignificanceofchipseqpeakcalls
AT awdehaseel recaprevealsthetruestatisticalsignificanceofchipseqpeakcalls
AT perkinstheodorej recaprevealsthetruestatisticalsignificanceofchipseqpeakcalls