Cargando…
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been devel...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098059/ https://www.ncbi.nlm.nih.gov/pubmed/20459804 http://dx.doi.org/10.1186/1471-2105-11-237 |
_version_ | 1782203907887333376 |
---|---|
author | Zhu, Lihua J Gazin, Claude Lawson, Nathan D Pagès, Hervé Lin, Simon M Lapointe, David S Green, Michael R |
author_facet | Zhu, Lihua J Gazin, Claude Lawson, Nathan D Pagès, Hervé Lin, Simon M Lapointe, David S Green, Michael R |
author_sort | Zhu, Lihua J |
collection | PubMed |
description | BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. RESULTS: We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. CONCLUSIONS: ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. |
format | Text |
id | pubmed-3098059 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30980592011-05-20 ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data Zhu, Lihua J Gazin, Claude Lawson, Nathan D Pagès, Hervé Lin, Simon M Lapointe, David S Green, Michael R BMC Bioinformatics Software BACKGROUND: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. RESULTS: We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. CONCLUSIONS: ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. BioMed Central 2010-05-11 /pmc/articles/PMC3098059/ /pubmed/20459804 http://dx.doi.org/10.1186/1471-2105-11-237 Text en Copyright ©2010 Zhu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Zhu, Lihua J Gazin, Claude Lawson, Nathan D Pagès, Hervé Lin, Simon M Lapointe, David S Green, Michael R ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data |
title | ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data |
title_full | ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data |
title_fullStr | ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data |
title_full_unstemmed | ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data |
title_short | ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data |
title_sort | chippeakanno: a bioconductor package to annotate chip-seq and chip-chip data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098059/ https://www.ncbi.nlm.nih.gov/pubmed/20459804 http://dx.doi.org/10.1186/1471-2105-11-237 |
work_keys_str_mv | AT zhulihuaj chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata AT gazinclaude chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata AT lawsonnathand chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata AT pagesherve chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata AT linsimonm chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata AT lapointedavids chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata AT greenmichaelr chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata |