Cargando…

Simplifying the development of portable, scalable, and reproducible workflows

Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs,...

Descripción completa

Detalles Bibliográficos
Autores principales: Piccolo, Stephen R, Ence, Zachary E, Anderson, Elizabeth C, Chang, Jeffrey T, Bild, Andrea H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8514239/
https://www.ncbi.nlm.nih.gov/pubmed/34643507
http://dx.doi.org/10.7554/eLife.71069
_version_ 1784583346207064064
author Piccolo, Stephen R
Ence, Zachary E
Anderson, Elizabeth C
Chang, Jeffrey T
Bild, Andrea H
author_facet Piccolo, Stephen R
Ence, Zachary E
Anderson, Elizabeth C
Chang, Jeffrey T
Bild, Andrea H
author_sort Piccolo, Stephen R
collection PubMed
description Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig.
format Online
Article
Text
id pubmed-8514239
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-85142392021-10-15 Simplifying the development of portable, scalable, and reproducible workflows Piccolo, Stephen R Ence, Zachary E Anderson, Elizabeth C Chang, Jeffrey T Bild, Andrea H eLife Computational and Systems Biology Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig. eLife Sciences Publications, Ltd 2021-10-13 /pmc/articles/PMC8514239/ /pubmed/34643507 http://dx.doi.org/10.7554/eLife.71069 Text en © 2021, Piccolo et al https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Computational and Systems Biology
Piccolo, Stephen R
Ence, Zachary E
Anderson, Elizabeth C
Chang, Jeffrey T
Bild, Andrea H
Simplifying the development of portable, scalable, and reproducible workflows
title Simplifying the development of portable, scalable, and reproducible workflows
title_full Simplifying the development of portable, scalable, and reproducible workflows
title_fullStr Simplifying the development of portable, scalable, and reproducible workflows
title_full_unstemmed Simplifying the development of portable, scalable, and reproducible workflows
title_short Simplifying the development of portable, scalable, and reproducible workflows
title_sort simplifying the development of portable, scalable, and reproducible workflows
topic Computational and Systems Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8514239/
https://www.ncbi.nlm.nih.gov/pubmed/34643507
http://dx.doi.org/10.7554/eLife.71069
work_keys_str_mv AT piccolostephenr simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT encezacharye simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT andersonelizabethc simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT changjeffreyt simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT bildandreah simplifyingthedevelopmentofportablescalableandreproducibleworkflows