Cargando…

Theoretical characterisation of strand cross-correlation in ChIP-seq

BACKGROUND: Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak...

Descripción completa

Detalles Bibliográficos
Autores principales: Anzawa, Hayato, Yamagata, Hitoshi, Kinoshita, Kengo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7510163/
https://www.ncbi.nlm.nih.gov/pubmed/32962634
http://dx.doi.org/10.1186/s12859-020-03729-6
_version_ 1783585732016734208
author Anzawa, Hayato
Yamagata, Hitoshi
Kinoshita, Kengo
author_facet Anzawa, Hayato
Yamagata, Hitoshi
Kinoshita, Kengo
author_sort Anzawa, Hayato
collection PubMed
description BACKGROUND: Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure. RESULTS: We introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Based on these insights, we proposed virtual S/N (VSN), a novel peak call-free metric for S/N assessment. We also developed PyMaSC, a tool to calculate strand cross-correlation and VSN efficiently. VSN achieved most consistent S/N estimation for various ChIP targets and sequencing read depths. Furthermore, we demonstrated that a combination of VSN and pre-existing peak calling results enable the estimation of the numbers of detectable peaks for posterior experiments and assess peak calling results. CONCLUSIONS: We present the first theoretical insights into the strand cross-correlation, and the results reveal the potential and the limitations of strand cross-correlation analysis. Our quality assessment framework using VSN provides peak call-independent QC and will help in the evaluation of peak call analysis in ChIP-seq experiments.
format Online
Article
Text
id pubmed-7510163
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-75101632020-09-25 Theoretical characterisation of strand cross-correlation in ChIP-seq Anzawa, Hayato Yamagata, Hitoshi Kinoshita, Kengo BMC Bioinformatics Research Article BACKGROUND: Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure. RESULTS: We introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Based on these insights, we proposed virtual S/N (VSN), a novel peak call-free metric for S/N assessment. We also developed PyMaSC, a tool to calculate strand cross-correlation and VSN efficiently. VSN achieved most consistent S/N estimation for various ChIP targets and sequencing read depths. Furthermore, we demonstrated that a combination of VSN and pre-existing peak calling results enable the estimation of the numbers of detectable peaks for posterior experiments and assess peak calling results. CONCLUSIONS: We present the first theoretical insights into the strand cross-correlation, and the results reveal the potential and the limitations of strand cross-correlation analysis. Our quality assessment framework using VSN provides peak call-independent QC and will help in the evaluation of peak call analysis in ChIP-seq experiments. BioMed Central 2020-09-22 /pmc/articles/PMC7510163/ /pubmed/32962634 http://dx.doi.org/10.1186/s12859-020-03729-6 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Anzawa, Hayato
Yamagata, Hitoshi
Kinoshita, Kengo
Theoretical characterisation of strand cross-correlation in ChIP-seq
title Theoretical characterisation of strand cross-correlation in ChIP-seq
title_full Theoretical characterisation of strand cross-correlation in ChIP-seq
title_fullStr Theoretical characterisation of strand cross-correlation in ChIP-seq
title_full_unstemmed Theoretical characterisation of strand cross-correlation in ChIP-seq
title_short Theoretical characterisation of strand cross-correlation in ChIP-seq
title_sort theoretical characterisation of strand cross-correlation in chip-seq
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7510163/
https://www.ncbi.nlm.nih.gov/pubmed/32962634
http://dx.doi.org/10.1186/s12859-020-03729-6
work_keys_str_mv AT anzawahayato theoreticalcharacterisationofstrandcrosscorrelationinchipseq
AT yamagatahitoshi theoreticalcharacterisationofstrandcrosscorrelationinchipseq
AT kinoshitakengo theoreticalcharacterisationofstrandcrosscorrelationinchipseq