Cargando…
Characterizing batch effects and binding site-specific variability in ChIP-seq data
Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types o...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515842/ https://www.ncbi.nlm.nih.gov/pubmed/34661103 http://dx.doi.org/10.1093/nargab/lqab098 |
_version_ | 1784583695218245632 |
---|---|
author | Teng, Mingxiang Du, Dongliang Chen, Danfeng Irizarry, Rafael A |
author_facet | Teng, Mingxiang Du, Dongliang Chen, Danfeng Irizarry, Rafael A |
author_sort | Teng, Mingxiang |
collection | PubMed |
description | Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types of variability, the batch effects by sequencing laboratories and differences between biological replicates, not associated with changes in condition or state, vary across genomic sites. This implies that observed differences between samples from different conditions or states, such as cell-type, must be assessed statistically, with an understanding of the distribution of obscuring noise. We present a statistical approach that characterizes both differences of interests and these source of variability through the parameters of a mixed effects model. We demonstrate the utility of our approach on a CTCF binding dataset composed of 211 samples representing 90 different cell-types measured across three different laboratories. The results revealed that sites exhibiting large variability were associated with sequence characteristics such as GC-content and low complexity. Finally, we identified TFs associated with high-variance CTCF sites using TF motifs documented in public databases, pointing the possibility of these being false positives if the sources of variability are not properly accounted for. |
format | Online Article Text |
id | pubmed-8515842 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85158422021-10-15 Characterizing batch effects and binding site-specific variability in ChIP-seq data Teng, Mingxiang Du, Dongliang Chen, Danfeng Irizarry, Rafael A NAR Genom Bioinform Standard Article Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types of variability, the batch effects by sequencing laboratories and differences between biological replicates, not associated with changes in condition or state, vary across genomic sites. This implies that observed differences between samples from different conditions or states, such as cell-type, must be assessed statistically, with an understanding of the distribution of obscuring noise. We present a statistical approach that characterizes both differences of interests and these source of variability through the parameters of a mixed effects model. We demonstrate the utility of our approach on a CTCF binding dataset composed of 211 samples representing 90 different cell-types measured across three different laboratories. The results revealed that sites exhibiting large variability were associated with sequence characteristics such as GC-content and low complexity. Finally, we identified TFs associated with high-variance CTCF sites using TF motifs documented in public databases, pointing the possibility of these being false positives if the sources of variability are not properly accounted for. Oxford University Press 2021-10-14 /pmc/articles/PMC8515842/ /pubmed/34661103 http://dx.doi.org/10.1093/nargab/lqab098 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Teng, Mingxiang Du, Dongliang Chen, Danfeng Irizarry, Rafael A Characterizing batch effects and binding site-specific variability in ChIP-seq data |
title | Characterizing batch effects and binding site-specific variability in ChIP-seq data |
title_full | Characterizing batch effects and binding site-specific variability in ChIP-seq data |
title_fullStr | Characterizing batch effects and binding site-specific variability in ChIP-seq data |
title_full_unstemmed | Characterizing batch effects and binding site-specific variability in ChIP-seq data |
title_short | Characterizing batch effects and binding site-specific variability in ChIP-seq data |
title_sort | characterizing batch effects and binding site-specific variability in chip-seq data |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8515842/ https://www.ncbi.nlm.nih.gov/pubmed/34661103 http://dx.doi.org/10.1093/nargab/lqab098 |
work_keys_str_mv | AT tengmingxiang characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata AT dudongliang characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata AT chendanfeng characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata AT irizarryrafaela characterizingbatcheffectsandbindingsitespecificvariabilityinchipseqdata |