Cargando…

Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide rang...

Descripción completa

Detalles Bibliográficos
Autores principales: Passerat-Palmbach, Jonathan, Reuillon, Romain, Leclaire, Mathieu, Makropoulos, Antonios, Robinson, Emma C., Parisot, Sarah, Rueckert, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5361107/
https://www.ncbi.nlm.nih.gov/pubmed/28381997
http://dx.doi.org/10.3389/fninf.2017.00021
_version_ 1782516702895931392
author Passerat-Palmbach, Jonathan
Reuillon, Romain
Leclaire, Mathieu
Makropoulos, Antonios
Robinson, Emma C.
Parisot, Sarah
Rueckert, Daniel
author_facet Passerat-Palmbach, Jonathan
Reuillon, Romain
Leclaire, Mathieu
Makropoulos, Antonios
Robinson, Emma C.
Parisot, Sarah
Rueckert, Daniel
author_sort Passerat-Palmbach, Jonathan
collection PubMed
description OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
format Online
Article
Text
id pubmed-5361107
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-53611072017-04-05 Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System Passerat-Palmbach, Jonathan Reuillon, Romain Leclaire, Mathieu Makropoulos, Antonios Robinson, Emma C. Parisot, Sarah Rueckert, Daniel Front Neuroinform Neuroscience OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). Frontiers Media S.A. 2017-03-22 /pmc/articles/PMC5361107/ /pubmed/28381997 http://dx.doi.org/10.3389/fninf.2017.00021 Text en Copyright © 2017 Passerat-Palmbach, Reuillon, Leclaire, Makropoulos, Robinson, Parisot and Rueckert. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Passerat-Palmbach, Jonathan
Reuillon, Romain
Leclaire, Mathieu
Makropoulos, Antonios
Robinson, Emma C.
Parisot, Sarah
Rueckert, Daniel
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
title Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
title_full Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
title_fullStr Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
title_full_unstemmed Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
title_short Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System
title_sort reproducible large-scale neuroimaging studies with the openmole workflow management system
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5361107/
https://www.ncbi.nlm.nih.gov/pubmed/28381997
http://dx.doi.org/10.3389/fninf.2017.00021
work_keys_str_mv AT passeratpalmbachjonathan reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem
AT reuillonromain reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem
AT leclairemathieu reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem
AT makropoulosantonios reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem
AT robinsonemmac reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem
AT parisotsarah reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem
AT rueckertdaniel reproduciblelargescaleneuroimagingstudieswiththeopenmoleworkflowmanagementsystem