Cargando…

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

BACKGROUND: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such...

Descripción completa

Detalles Bibliográficos
Autores principales: Angelini, Claudia, Heller, Ruth, Volkinshtein, Rita, Yekutieli, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448883/
https://www.ncbi.nlm.nih.gov/pubmed/25957089
http://dx.doi.org/10.1186/s12859-015-0579-z
_version_ 1782373782463184896
author Angelini, Claudia
Heller, Ruth
Volkinshtein, Rita
Yekutieli, Daniel
author_facet Angelini, Claudia
Heller, Ruth
Volkinshtein, Rita
Yekutieli, Daniel
author_sort Angelini, Claudia
collection PubMed
description BACKGROUND: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. RESULTS: In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. CONCLUSIONS: Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0579-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4448883
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44488832015-05-30 Is this the right normalization? A diagnostic tool for ChIP-seq normalization Angelini, Claudia Heller, Ruth Volkinshtein, Rita Yekutieli, Daniel BMC Bioinformatics Research Article BACKGROUND: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. RESULTS: In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. CONCLUSIONS: Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0579-z) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-09 /pmc/articles/PMC4448883/ /pubmed/25957089 http://dx.doi.org/10.1186/s12859-015-0579-z Text en © Angelini et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Angelini, Claudia
Heller, Ruth
Volkinshtein, Rita
Yekutieli, Daniel
Is this the right normalization? A diagnostic tool for ChIP-seq normalization
title Is this the right normalization? A diagnostic tool for ChIP-seq normalization
title_full Is this the right normalization? A diagnostic tool for ChIP-seq normalization
title_fullStr Is this the right normalization? A diagnostic tool for ChIP-seq normalization
title_full_unstemmed Is this the right normalization? A diagnostic tool for ChIP-seq normalization
title_short Is this the right normalization? A diagnostic tool for ChIP-seq normalization
title_sort is this the right normalization? a diagnostic tool for chip-seq normalization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448883/
https://www.ncbi.nlm.nih.gov/pubmed/25957089
http://dx.doi.org/10.1186/s12859-015-0579-z
work_keys_str_mv AT angeliniclaudia isthistherightnormalizationadiagnostictoolforchipseqnormalization
AT hellerruth isthistherightnormalizationadiagnostictoolforchipseqnormalization
AT volkinshteinrita isthistherightnormalizationadiagnostictoolforchipseqnormalization
AT yekutielidaniel isthistherightnormalizationadiagnostictoolforchipseqnormalization