Cargando…

CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction

BACKGROUND: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible...

Descripción completa

Detalles Bibliográficos
Autores principales: Guzman, Carlos, D’Orso, Iván
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549294/
https://www.ncbi.nlm.nih.gov/pubmed/28789639
http://dx.doi.org/10.1186/s12859-017-1770-1
_version_ 1783255946741415936
author Guzman, Carlos
D’Orso, Iván
author_facet Guzman, Carlos
D’Orso, Iván
author_sort Guzman, Carlos
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. RESULTS: We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER’s pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an “analysis” mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. CONCLUSIONS: Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER’s integrative “analysis” mode allows researchers to elicit important biological information from the combined dataset analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1770-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5549294
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55492942017-08-11 CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction Guzman, Carlos D’Orso, Iván BMC Bioinformatics Software BACKGROUND: Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies. RESULTS: We have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER’s pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an “analysis” mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together. CONCLUSIONS: Using public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER’s integrative “analysis” mode allows researchers to elicit important biological information from the combined dataset analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1770-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-08 /pmc/articles/PMC5549294/ /pubmed/28789639 http://dx.doi.org/10.1186/s12859-017-1770-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Guzman, Carlos
D’Orso, Iván
CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
title CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
title_full CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
title_fullStr CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
title_full_unstemmed CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
title_short CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
title_sort cipher: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549294/
https://www.ncbi.nlm.nih.gov/pubmed/28789639
http://dx.doi.org/10.1186/s12859-017-1770-1
work_keys_str_mv AT guzmancarlos cipheraflexibleandextensiveworkflowplatformforintegrativenextgenerationsequencingdataanalysisandgenomicregulatoryelementprediction
AT dorsoivan cipheraflexibleandextensiveworkflowplatformforintegrativenextgenerationsequencingdataanalysisandgenomicregulatoryelementprediction