Cargando…

Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data

Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER ap...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Taemook, Seo, Hogyu David, Hennighausen, Lothar, Lee, Daeyoup, Kang, Keunsoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961211/
https://www.ncbi.nlm.nih.gov/pubmed/29420797
http://dx.doi.org/10.1093/nar/gky083
_version_ 1783324692020461568
author Kim, Taemook
Seo, Hogyu David
Hennighausen, Lothar
Lee, Daeyoup
Kang, Keunsoo
author_facet Kim, Taemook
Seo, Hogyu David
Hennighausen, Lothar
Lee, Daeyoup
Kang, Keunsoo
author_sort Kim, Taemook
collection PubMed
description Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed to generate BAM and BigWig files, which are used for advanced analyses and visualization. Currently, it can process NGS data from popular model genomes such as, human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), plant (Arabidopsis thaliana), zebrafish (Danio rerio), fruit fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and budding yeast (Saccharomyces cerevisiae) genomes. With the processed files from Octopus-toolkit, the meta-analysis of various data sets, motif searches for DNA-binding proteins, and the identification of differentially expressed genes and/or protein-binding sites can be easily conducted with few commands by users. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data.
format Online
Article
Text
id pubmed-5961211
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59612112018-06-06 Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data Kim, Taemook Seo, Hogyu David Hennighausen, Lothar Lee, Daeyoup Kang, Keunsoo Nucleic Acids Res Methods Online Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed to generate BAM and BigWig files, which are used for advanced analyses and visualization. Currently, it can process NGS data from popular model genomes such as, human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), plant (Arabidopsis thaliana), zebrafish (Danio rerio), fruit fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and budding yeast (Saccharomyces cerevisiae) genomes. With the processed files from Octopus-toolkit, the meta-analysis of various data sets, motif searches for DNA-binding proteins, and the identification of differentially expressed genes and/or protein-binding sites can be easily conducted with few commands by users. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data. Oxford University Press 2018-05-18 2018-02-06 /pmc/articles/PMC5961211/ /pubmed/29420797 http://dx.doi.org/10.1093/nar/gky083 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Kim, Taemook
Seo, Hogyu David
Hennighausen, Lothar
Lee, Daeyoup
Kang, Keunsoo
Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
title Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
title_full Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
title_fullStr Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
title_full_unstemmed Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
title_short Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
title_sort octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961211/
https://www.ncbi.nlm.nih.gov/pubmed/29420797
http://dx.doi.org/10.1093/nar/gky083
work_keys_str_mv AT kimtaemook octopustoolkitaworkflowtoautomateminingofpublicepigenomicandtranscriptomicnextgenerationsequencingdata
AT seohogyudavid octopustoolkitaworkflowtoautomateminingofpublicepigenomicandtranscriptomicnextgenerationsequencingdata
AT hennighausenlothar octopustoolkitaworkflowtoautomateminingofpublicepigenomicandtranscriptomicnextgenerationsequencingdata
AT leedaeyoup octopustoolkitaworkflowtoautomateminingofpublicepigenomicandtranscriptomicnextgenerationsequencingdata
AT kangkeunsoo octopustoolkitaworkflowtoautomateminingofpublicepigenomicandtranscriptomicnextgenerationsequencingdata