Cargando…

RACS: rapid analysis of ChIP-Seq data for contig based genomes

BACKGROUND: Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is...

Descripción completa

Detalles Bibliográficos
Autores principales: Saettone, Alejandro, Ponce, Marcelo, Nabeel-Shah, Syed, Fillingham, Jeffrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6819487/
https://www.ncbi.nlm.nih.gov/pubmed/31664892
http://dx.doi.org/10.1186/s12859-019-3100-2
_version_ 1783463742031265792
author Saettone, Alejandro
Ponce, Marcelo
Nabeel-Shah, Syed
Fillingham, Jeffrey
author_facet Saettone, Alejandro
Ponce, Marcelo
Nabeel-Shah, Syed
Fillingham, Jeffrey
author_sort Saettone, Alejandro
collection PubMed
description BACKGROUND: Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. RESULTS: We present a one-stop computational pipeline, “Rapid Analysis of ChIP-Seq data” (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS. RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. CONCLUSIONS: The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression.
format Online
Article
Text
id pubmed-6819487
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68194872019-10-31 RACS: rapid analysis of ChIP-Seq data for contig based genomes Saettone, Alejandro Ponce, Marcelo Nabeel-Shah, Syed Fillingham, Jeffrey BMC Bioinformatics Methodology Article BACKGROUND: Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely-used molecular method to investigate the function of chromatin-related proteins by identifying their associated DNA sequences on a genomic scale. ChIP-Seq generates large quantities of data that is difficult to process and analyze, particularly for organisms with a contig-based sequenced genomes that typically have minimal annotation on their associated set of genes other than their associated coordinates primarily predicted by gene finding programs. Poorly annotated genome sequence makes comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. RESULTS: We present a one-stop computational pipeline, “Rapid Analysis of ChIP-Seq data” (RACS), that utilizes traditional High-Performance Computing (HPC) techniques in association with open source tools for processing and analyzing raw ChIP-Seq data. RACS is an open source computational pipeline available from any of the following repositories https://bitbucket.org/mjponce/RACS or https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACS. RACS is particularly useful for ChIP-Seq in organisms with contig-based genomes that have poor gene annotation to aid protein function discovery.To test the performance and efficiency of RACS, we analyzed ChIP-Seq data previously published in a model organism Tetrahymena thermophila which has a contig-based genome. We assessed the generality of RACS by analyzing a previously published data set generated using the model organism Oxytricha trifallax, whose genome sequence is also contig-based with poor annotation. CONCLUSIONS: The RACS computational pipeline presented in this report is an efficient and reliable tool to analyze genome-wide raw ChIP-Seq data generated in model organisms with poorly annotated contig-based genome sequence. Because RACS segregates the found read accumulations between genic and intergenic regions, it is particularly efficient for rapid downstream analyses of proteins involved in gene expression. BioMed Central 2019-10-29 /pmc/articles/PMC6819487/ /pubmed/31664892 http://dx.doi.org/10.1186/s12859-019-3100-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Saettone, Alejandro
Ponce, Marcelo
Nabeel-Shah, Syed
Fillingham, Jeffrey
RACS: rapid analysis of ChIP-Seq data for contig based genomes
title RACS: rapid analysis of ChIP-Seq data for contig based genomes
title_full RACS: rapid analysis of ChIP-Seq data for contig based genomes
title_fullStr RACS: rapid analysis of ChIP-Seq data for contig based genomes
title_full_unstemmed RACS: rapid analysis of ChIP-Seq data for contig based genomes
title_short RACS: rapid analysis of ChIP-Seq data for contig based genomes
title_sort racs: rapid analysis of chip-seq data for contig based genomes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6819487/
https://www.ncbi.nlm.nih.gov/pubmed/31664892
http://dx.doi.org/10.1186/s12859-019-3100-2
work_keys_str_mv AT saettonealejandro racsrapidanalysisofchipseqdataforcontigbasedgenomes
AT poncemarcelo racsrapidanalysisofchipseqdataforcontigbasedgenomes
AT nabeelshahsyed racsrapidanalysisofchipseqdataforcontigbasedgenomes
AT fillinghamjeffrey racsrapidanalysisofchipseqdataforcontigbasedgenomes