Cargando…

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challeng...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Baekdoo, Ali, Thahmina, Lijeron, Carlos, Afgan, Enis, Krampis, Konstantinos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5569920/
https://www.ncbi.nlm.nih.gov/pubmed/28854616
http://dx.doi.org/10.1093/gigascience/gix048
_version_ 1783259079039254528
author Kim, Baekdoo
Ali, Thahmina
Lijeron, Carlos
Afgan, Enis
Krampis, Konstantinos
author_facet Kim, Baekdoo
Ali, Thahmina
Lijeron, Carlos
Afgan, Enis
Krampis, Konstantinos
author_sort Kim, Baekdoo
collection PubMed
description Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.
format Online
Article
Text
id pubmed-5569920
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-55699202017-08-29 Bio-Docklets: virtualization containers for single-step execution of NGS pipelines Kim, Baekdoo Ali, Thahmina Lijeron, Carlos Afgan, Enis Krampis, Konstantinos Gigascience Technical Note Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. Oxford University Press 2017-06-27 /pmc/articles/PMC5569920/ /pubmed/28854616 http://dx.doi.org/10.1093/gigascience/gix048 Text en © The Authors 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Kim, Baekdoo
Ali, Thahmina
Lijeron, Carlos
Afgan, Enis
Krampis, Konstantinos
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
title Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
title_full Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
title_fullStr Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
title_full_unstemmed Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
title_short Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
title_sort bio-docklets: virtualization containers for single-step execution of ngs pipelines
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5569920/
https://www.ncbi.nlm.nih.gov/pubmed/28854616
http://dx.doi.org/10.1093/gigascience/gix048
work_keys_str_mv AT kimbaekdoo biodockletsvirtualizationcontainersforsinglestepexecutionofngspipelines
AT alithahmina biodockletsvirtualizationcontainersforsinglestepexecutionofngspipelines
AT lijeroncarlos biodockletsvirtualizationcontainersforsinglestepexecutionofngspipelines
AT afganenis biodockletsvirtualizationcontainersforsinglestepexecutionofngspipelines
AT krampiskonstantinos biodockletsvirtualizationcontainersforsinglestepexecutionofngspipelines