Cargando…
ChIPulate: A comprehensive ChIP-seq simulation pipeline
ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as expe...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6445533/ https://www.ncbi.nlm.nih.gov/pubmed/30897079 http://dx.doi.org/10.1371/journal.pcbi.1006921 |
_version_ | 1783408216241078272 |
---|---|
author | Datta, Vishaka Hannenhalli, Sridhar Siddharthan, Rahul |
author_facet | Datta, Vishaka Hannenhalli, Sridhar Siddharthan, Rahul |
author_sort | Datta, Vishaka |
collection | PubMed |
description | ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate. |
format | Online Article Text |
id | pubmed-6445533 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-64455332019-04-17 ChIPulate: A comprehensive ChIP-seq simulation pipeline Datta, Vishaka Hannenhalli, Sridhar Siddharthan, Rahul PLoS Comput Biol Research Article ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate. Public Library of Science 2019-03-21 /pmc/articles/PMC6445533/ /pubmed/30897079 http://dx.doi.org/10.1371/journal.pcbi.1006921 Text en © 2019 Datta et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Datta, Vishaka Hannenhalli, Sridhar Siddharthan, Rahul ChIPulate: A comprehensive ChIP-seq simulation pipeline |
title | ChIPulate: A comprehensive ChIP-seq simulation pipeline |
title_full | ChIPulate: A comprehensive ChIP-seq simulation pipeline |
title_fullStr | ChIPulate: A comprehensive ChIP-seq simulation pipeline |
title_full_unstemmed | ChIPulate: A comprehensive ChIP-seq simulation pipeline |
title_short | ChIPulate: A comprehensive ChIP-seq simulation pipeline |
title_sort | chipulate: a comprehensive chip-seq simulation pipeline |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6445533/ https://www.ncbi.nlm.nih.gov/pubmed/30897079 http://dx.doi.org/10.1371/journal.pcbi.1006921 |
work_keys_str_mv | AT dattavishaka chipulateacomprehensivechipseqsimulationpipeline AT hannenhallisridhar chipulateacomprehensivechipseqsimulationpipeline AT siddharthanrahul chipulateacomprehensivechipseqsimulationpipeline |