Cargando…

ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

BACKGROUND: In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationall...

Descripción completa

Detalles Bibliográficos
Autores principales: Muiño, Jose M, Kaufmann, Kerstin, van Ham, Roeland CHJ, Angenent, Gerco C, Krajewski, Pawel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3114017/
https://www.ncbi.nlm.nih.gov/pubmed/21554688
http://dx.doi.org/10.1186/1746-4811-7-11
_version_ 1782206021378244608
author Muiño, Jose M
Kaufmann, Kerstin
van Ham, Roeland CHJ
Angenent, Gerco C
Krajewski, Pawel
author_facet Muiño, Jose M
Kaufmann, Kerstin
van Ham, Roeland CHJ
Angenent, Gerco C
Krajewski, Pawel
author_sort Muiño, Jose M
collection PubMed
description BACKGROUND: In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology. RESULTS: We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate. CONCLUSIONS: The software ChIP-seq Analysis in R (CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate.
format Online
Article
Text
id pubmed-3114017
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31140172011-06-14 ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions Muiño, Jose M Kaufmann, Kerstin van Ham, Roeland CHJ Angenent, Gerco C Krajewski, Pawel Plant Methods Software BACKGROUND: In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology. RESULTS: We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate. CONCLUSIONS: The software ChIP-seq Analysis in R (CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate. BioMed Central 2011-05-09 /pmc/articles/PMC3114017/ /pubmed/21554688 http://dx.doi.org/10.1186/1746-4811-7-11 Text en Copyright ©2011 Muiño et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Muiño, Jose M
Kaufmann, Kerstin
van Ham, Roeland CHJ
Angenent, Gerco C
Krajewski, Pawel
ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions
title ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions
title_full ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions
title_fullStr ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions
title_full_unstemmed ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions
title_short ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions
title_sort chip-seq analysis in r (csar): an r package for the statistical detection of protein-bound genomic regions
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3114017/
https://www.ncbi.nlm.nih.gov/pubmed/21554688
http://dx.doi.org/10.1186/1746-4811-7-11
work_keys_str_mv AT muinojosem chipseqanalysisinrcsaranrpackageforthestatisticaldetectionofproteinboundgenomicregions
AT kaufmannkerstin chipseqanalysisinrcsaranrpackageforthestatisticaldetectionofproteinboundgenomicregions
AT vanhamroelandchj chipseqanalysisinrcsaranrpackageforthestatisticaldetectionofproteinboundgenomicregions
AT angenentgercoc chipseqanalysisinrcsaranrpackageforthestatisticaldetectionofproteinboundgenomicregions
AT krajewskipawel chipseqanalysisinrcsaranrpackageforthestatisticaldetectionofproteinboundgenomicregions