Cargando…

Characterizing batch effects and binding site-specific variability in ChIP-seq data

Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types o...

Descripción completa

Detalles Bibliográficos
Autores principales: Teng, Mingxiang, Du, Dongliang, Chen, Danfeng, Irizarry, Rafael A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515842/
https://www.ncbi.nlm.nih.gov/pubmed/34661103
http://dx.doi.org/10.1093/nargab/lqab098
_version_ 1784583695218245632
author Teng, Mingxiang
Du, Dongliang
Chen, Danfeng
Irizarry, Rafael A
author_facet Teng, Mingxiang
Du, Dongliang
Chen, Danfeng
Irizarry, Rafael A
author_sort Teng, Mingxiang
collection PubMed
description Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types of variability, the batch effects by sequencing laboratories and differences between biological replicates, not associated with changes in condition or state, vary across genomic sites. This implies that observed differences between samples from different conditions or states, such as cell-type, must be assessed statistically, with an understanding of the distribution of obscuring noise. We present a statistical approach that characterizes both differences of interests and these source of variability through the parameters of a mixed effects model. We demonstrate the utility of our approach on a CTCF binding dataset composed of 211 samples representing 90 different cell-types measured across three different laboratories. The results revealed that sites exhibiting large variability were associated with sequence characteristics such as GC-content and low complexity. Finally, we identified TFs associated with high-variance CTCF sites using TF motifs documented in public databases, pointing the possibility of these being false positives if the sources of variability are not properly accounted for.
format Online
Article
Text
id pubmed-8515842
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85158422021-10-15 Characterizing batch effects and binding site-specific variability in ChIP-seq data Teng, Mingxiang Du, Dongliang Chen, Danfeng Irizarry, Rafael A NAR Genom Bioinform Standard Article Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types of variability, the batch effects by sequencing laboratories and differences between biological replicates, not associated with changes in condition or state, vary across genomic sites. This implies that observed differences between samples from different conditions or states, such as cell-type, must be assessed statistically, with an understanding of the distribution of obscuring noise. We present a statistical approach that characterizes both differences of interests and these source of variability through the parameters of a mixed effects model. We demonstrate the utility of our approach on a CTCF binding dataset composed of 211 samples representing 90 different cell-types measured across three different laboratories. The results revealed that sites exhibiting large variability were associated with sequence characteristics such as GC-content and low complexity. Finally, we identified TFs associated with high-variance CTCF sites using TF motifs documented in public databases, pointing the possibility of these being false positives if the sources of variability are not properly accounted for. Oxford University Press 2021-10-14 /pmc/articles/PMC8515842/ /pubmed/34661103 http://dx.doi.org/10.1093/nargab/lqab098 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Teng, Mingxiang
Du, Dongliang
Chen, Danfeng
Irizarry, Rafael A
Characterizing batch effects and binding site-specific variability in ChIP-seq data
title Characterizing batch effects and binding site-specific variability in ChIP-seq data
title_full Characterizing batch effects and binding site-specific variability in ChIP-seq data
title_fullStr Characterizing batch effects and binding site-specific variability in ChIP-seq data
title_full_unstemmed Characterizing batch effects and binding site-specific variability in ChIP-seq data
title_short Characterizing batch effects and binding site-specific variability in ChIP-seq data
title_sort characterizing batch effects and binding site-specific variability in chip-seq data
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515842/
https://www.ncbi.nlm.nih.gov/pubmed/34661103
http://dx.doi.org/10.1093/nargab/lqab098
work_keys_str_mv AT tengmingxiang characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata
AT dudongliang characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata
AT chendanfeng characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata
AT irizarryrafaela characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata