Cargando…

APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools

Background  Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a hu...

Descripción completa

Detalles Bibliográficos
Autores principales: Giménez-Alventosa, Vicent, Segrelles, José Damián, Moltó, Germán, Roca-Sogorb, Mar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Georg Thieme Verlag KG 2020
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7746519/
https://www.ncbi.nlm.nih.gov/pubmed/32777825
http://dx.doi.org/10.1055/s-0040-1712460
_version_ 1783624816050307072
author Giménez-Alventosa, Vicent
Segrelles, José Damián
Moltó, Germán
Roca-Sogorb, Mar
author_facet Giménez-Alventosa, Vicent
Segrelles, José Damián
Moltó, Germán
Roca-Sogorb, Mar
author_sort Giménez-Alventosa, Vicent
collection PubMed
description Background  Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a huge waste of resources on nonprofitable research at Life Sciences field. As a consequence, scientific reproducibility is being fostered to promote Open Science through open databases and software tools that are typically deployed on existing computational resources. However, some computational experiments require complex virtual infrastructures, such as elastic clusters of PCs, that can be dynamically provided from multiple clouds. Obtaining these infrastructures requires not only an infrastructure provider, but also advanced knowledge in the cloud computing field. Objectives  The main aim of this paper is to improve reproducibility in life sciences to produce better and more cost-effective research. For that purpose, our intention is to simplify the infrastructure usage and deployment for researchers. Methods  This paper introduces Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools (APRICOT), an open source extension for Jupyter to deploy deterministic virtual infrastructures across multiclouds for reproducible scientific computational experiments. To exemplify its utilization and how APRICOT can improve the reproduction of experiments with complex computation requirements, two examples in the field of life sciences are provided. All requirements to reproduce both experiments are disclosed within APRICOT and, therefore, can be reproduced by the users. Results  To show the capabilities of APRICOT, we have processed a real magnetic resonance image to accurately characterize a prostate cancer using a Message Passing Interface cluster deployed automatically with APRICOT. In addition, the second example shows how APRICOT scales the deployed infrastructure, according to the workload, using a batch cluster. This example consists of a multiparametric study of a positron emission tomography image reconstruction. Conclusion  APRICOT's benefits are the integration of specific infrastructure deployment, the management and usage for Open Science, making experiments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation execution, results gathering, and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough to reproduce the experiment.
format Online
Article
Text
id pubmed-7746519
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Georg Thieme Verlag KG
record_format MEDLINE/PubMed
spelling pubmed-77465192020-12-21 APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools Giménez-Alventosa, Vicent Segrelles, José Damián Moltó, Germán Roca-Sogorb, Mar Methods Inf Med Background  Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a huge waste of resources on nonprofitable research at Life Sciences field. As a consequence, scientific reproducibility is being fostered to promote Open Science through open databases and software tools that are typically deployed on existing computational resources. However, some computational experiments require complex virtual infrastructures, such as elastic clusters of PCs, that can be dynamically provided from multiple clouds. Obtaining these infrastructures requires not only an infrastructure provider, but also advanced knowledge in the cloud computing field. Objectives  The main aim of this paper is to improve reproducibility in life sciences to produce better and more cost-effective research. For that purpose, our intention is to simplify the infrastructure usage and deployment for researchers. Methods  This paper introduces Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools (APRICOT), an open source extension for Jupyter to deploy deterministic virtual infrastructures across multiclouds for reproducible scientific computational experiments. To exemplify its utilization and how APRICOT can improve the reproduction of experiments with complex computation requirements, two examples in the field of life sciences are provided. All requirements to reproduce both experiments are disclosed within APRICOT and, therefore, can be reproduced by the users. Results  To show the capabilities of APRICOT, we have processed a real magnetic resonance image to accurately characterize a prostate cancer using a Message Passing Interface cluster deployed automatically with APRICOT. In addition, the second example shows how APRICOT scales the deployed infrastructure, according to the workload, using a batch cluster. This example consists of a multiparametric study of a positron emission tomography image reconstruction. Conclusion  APRICOT's benefits are the integration of specific infrastructure deployment, the management and usage for Open Science, making experiments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation execution, results gathering, and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough to reproduce the experiment. Georg Thieme Verlag KG 2020-12 2020-08-10 /pmc/articles/PMC7746519/ /pubmed/32777825 http://dx.doi.org/10.1055/s-0040-1712460 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial-License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ). https://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited.
spellingShingle Giménez-Alventosa, Vicent
Segrelles, José Damián
Moltó, Germán
Roca-Sogorb, Mar
APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
title APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
title_full APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
title_fullStr APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
title_full_unstemmed APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
title_short APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools
title_sort apricot: advanced platform for reproducible infrastructures in the cloud via open tools
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7746519/
https://www.ncbi.nlm.nih.gov/pubmed/32777825
http://dx.doi.org/10.1055/s-0040-1712460
work_keys_str_mv AT gimenezalventosavicent apricotadvancedplatformforreproducibleinfrastructuresinthecloudviaopentools
AT segrellesjosedamian apricotadvancedplatformforreproducibleinfrastructuresinthecloudviaopentools
AT moltogerman apricotadvancedplatformforreproducibleinfrastructuresinthecloudviaopentools
AT rocasogorbmar apricotadvancedplatformforreproducibleinfrastructuresinthecloudviaopentools