Cargando…

Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data

The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics’ public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic reg...

Descripción completa

Detalles Bibliográficos
Autores principales:	Teng, Mingxiang, Irizarry, Rafael A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory Press 2017
Materias:	Method
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668949/ https://www.ncbi.nlm.nih.gov/pubmed/29025895 http://dx.doi.org/10.1101/gr.220673.117

_version_	1783275766417457152
author	Teng, Mingxiang Irizarry, Rafael A.
author_facet	Teng, Mingxiang Irizarry, Rafael A.
author_sort	Teng, Mingxiang
collection	PubMed
description	The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics’ public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.
format	Online Article Text
id	pubmed-5668949
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Cold Spring Harbor Laboratory Press
record_format	MEDLINE/PubMed
spelling	pubmed-56689492018-05-01 Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data Teng, Mingxiang Irizarry, Rafael A. Genome Res Method The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics’ public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories. Cold Spring Harbor Laboratory Press 2017-11 /pmc/articles/PMC5668949/ /pubmed/29025895 http://dx.doi.org/10.1101/gr.220673.117 Text en © 2017 Teng and Irizarry; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle	Method Teng, Mingxiang Irizarry, Rafael A. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
title	Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
title_full	Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
title_fullStr	Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
title_full_unstemmed	Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
title_short	Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data
title_sort	accounting for gc-content bias reduces systematic errors and batch effects in chip-seq data
topic	Method
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668949/ https://www.ncbi.nlm.nih.gov/pubmed/29025895 http://dx.doi.org/10.1101/gr.220673.117
work_keys_str_mv	AT tengmingxiang accountingforgccontentbiasreducessystematicerrorsandbatcheffectsinchipseqdata AT irizarryrafaela accountingforgccontentbiasreducessystematicerrorsandbatcheffectsinchipseqdata

Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data

Ejemplares similares