Cargando…

From the desktop to the grid: scalable bioinformatics via workflow conversion

BACKGROUND: Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint colla...

Descripción completa

Detalles Bibliográficos
Autores principales: de la Garza, Luis, Veit, Johannes, Szolek, Andras, Röttig, Marc, Aiche, Stephan, Gesing, Sandra, Reinert, Knut, Kohlbacher, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788856/
https://www.ncbi.nlm.nih.gov/pubmed/26968893
http://dx.doi.org/10.1186/s12859-016-0978-9
_version_ 1782420777427009536
author de la Garza, Luis
Veit, Johannes
Szolek, Andras
Röttig, Marc
Aiche, Stephan
Gesing, Sandra
Reinert, Knut
Kohlbacher, Oliver
author_facet de la Garza, Luis
Veit, Johannes
Szolek, Andras
Röttig, Marc
Aiche, Stephan
Gesing, Sandra
Reinert, Knut
Kohlbacher, Oliver
author_sort de la Garza, Luis
collection PubMed
description BACKGROUND: Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free —an aspect that could potentially drive away members of the scientific community. RESULTS: We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. CONCLUSIONS: Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.
format Online
Article
Text
id pubmed-4788856
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47888562016-03-13 From the desktop to the grid: scalable bioinformatics via workflow conversion de la Garza, Luis Veit, Johannes Szolek, Andras Röttig, Marc Aiche, Stephan Gesing, Sandra Reinert, Knut Kohlbacher, Oliver BMC Bioinformatics Software BACKGROUND: Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free —an aspect that could potentially drive away members of the scientific community. RESULTS: We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. CONCLUSIONS: Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results. BioMed Central 2016-03-12 /pmc/articles/PMC4788856/ /pubmed/26968893 http://dx.doi.org/10.1186/s12859-016-0978-9 Text en © de la Garza et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
de la Garza, Luis
Veit, Johannes
Szolek, Andras
Röttig, Marc
Aiche, Stephan
Gesing, Sandra
Reinert, Knut
Kohlbacher, Oliver
From the desktop to the grid: scalable bioinformatics via workflow conversion
title From the desktop to the grid: scalable bioinformatics via workflow conversion
title_full From the desktop to the grid: scalable bioinformatics via workflow conversion
title_fullStr From the desktop to the grid: scalable bioinformatics via workflow conversion
title_full_unstemmed From the desktop to the grid: scalable bioinformatics via workflow conversion
title_short From the desktop to the grid: scalable bioinformatics via workflow conversion
title_sort from the desktop to the grid: scalable bioinformatics via workflow conversion
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788856/
https://www.ncbi.nlm.nih.gov/pubmed/26968893
http://dx.doi.org/10.1186/s12859-016-0978-9
work_keys_str_mv AT delagarzaluis fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT veitjohannes fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT szolekandras fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT rottigmarc fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT aichestephan fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT gesingsandra fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT reinertknut fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion
AT kohlbacheroliver fromthedesktoptothegridscalablebioinformaticsviaworkflowconversion