Cargando…
Is this the right normalization? A diagnostic tool for ChIP-seq normalization
BACKGROUND: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448883/ https://www.ncbi.nlm.nih.gov/pubmed/25957089 http://dx.doi.org/10.1186/s12859-015-0579-z |
_version_ | 1782373782463184896 |
---|---|
author | Angelini, Claudia Heller, Ruth Volkinshtein, Rita Yekutieli, Daniel |
author_facet | Angelini, Claudia Heller, Ruth Volkinshtein, Rita Yekutieli, Daniel |
author_sort | Angelini, Claudia |
collection | PubMed |
description | BACKGROUND: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. RESULTS: In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. CONCLUSIONS: Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0579-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4448883 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44488832015-05-30 Is this the right normalization? A diagnostic tool for ChIP-seq normalization Angelini, Claudia Heller, Ruth Volkinshtein, Rita Yekutieli, Daniel BMC Bioinformatics Research Article BACKGROUND: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. RESULTS: In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. CONCLUSIONS: Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0579-z) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-09 /pmc/articles/PMC4448883/ /pubmed/25957089 http://dx.doi.org/10.1186/s12859-015-0579-z Text en © Angelini et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Angelini, Claudia Heller, Ruth Volkinshtein, Rita Yekutieli, Daniel Is this the right normalization? A diagnostic tool for ChIP-seq normalization |
title | Is this the right normalization? A diagnostic tool for ChIP-seq normalization |
title_full | Is this the right normalization? A diagnostic tool for ChIP-seq normalization |
title_fullStr | Is this the right normalization? A diagnostic tool for ChIP-seq normalization |
title_full_unstemmed | Is this the right normalization? A diagnostic tool for ChIP-seq normalization |
title_short | Is this the right normalization? A diagnostic tool for ChIP-seq normalization |
title_sort | is this the right normalization? a diagnostic tool for chip-seq normalization |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448883/ https://www.ncbi.nlm.nih.gov/pubmed/25957089 http://dx.doi.org/10.1186/s12859-015-0579-z |
work_keys_str_mv | AT angeliniclaudia isthistherightnormalizationadiagnostictoolforchipseqnormalization AT hellerruth isthistherightnormalizationadiagnostictoolforchipseqnormalization AT volkinshteinrita isthistherightnormalizationadiagnostictoolforchipseqnormalization AT yekutielidaniel isthistherightnormalizationadiagnostictoolforchipseqnormalization |