Cargando…

Ten simple rules for writing Dockerfiles for reproducible data science

Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by...

Descripción completa

Detalles Bibliográficos
Autores principales: Nüst, Daniel, Sochat, Vanessa, Marwick, Ben, Eglen, Stephen J., Head, Tim, Hirst, Tony, Evans, Benjamin D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7654784/
https://www.ncbi.nlm.nih.gov/pubmed/33170857
http://dx.doi.org/10.1371/journal.pcbi.1008316
_version_ 1783608116814807040
author Nüst, Daniel
Sochat, Vanessa
Marwick, Ben
Eglen, Stephen J.
Head, Tim
Hirst, Tony
Evans, Benjamin D.
author_facet Nüst, Daniel
Sochat, Vanessa
Marwick, Ben
Eglen, Stephen J.
Head, Tim
Hirst, Tony
Evans, Benjamin D.
author_sort Nüst, Daniel
collection PubMed
description Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container’s image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows.
format Online
Article
Text
id pubmed-7654784
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76547842020-11-18 Ten simple rules for writing Dockerfiles for reproducible data science Nüst, Daniel Sochat, Vanessa Marwick, Ben Eglen, Stephen J. Head, Tim Hirst, Tony Evans, Benjamin D. PLoS Comput Biol Editorial Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container’s image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows. Public Library of Science 2020-11-10 /pmc/articles/PMC7654784/ /pubmed/33170857 http://dx.doi.org/10.1371/journal.pcbi.1008316 Text en © 2020 Nüst et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Editorial
Nüst, Daniel
Sochat, Vanessa
Marwick, Ben
Eglen, Stephen J.
Head, Tim
Hirst, Tony
Evans, Benjamin D.
Ten simple rules for writing Dockerfiles for reproducible data science
title Ten simple rules for writing Dockerfiles for reproducible data science
title_full Ten simple rules for writing Dockerfiles for reproducible data science
title_fullStr Ten simple rules for writing Dockerfiles for reproducible data science
title_full_unstemmed Ten simple rules for writing Dockerfiles for reproducible data science
title_short Ten simple rules for writing Dockerfiles for reproducible data science
title_sort ten simple rules for writing dockerfiles for reproducible data science
topic Editorial
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7654784/
https://www.ncbi.nlm.nih.gov/pubmed/33170857
http://dx.doi.org/10.1371/journal.pcbi.1008316
work_keys_str_mv AT nustdaniel tensimplerulesforwritingdockerfilesforreproducibledatascience
AT sochatvanessa tensimplerulesforwritingdockerfilesforreproducibledatascience
AT marwickben tensimplerulesforwritingdockerfilesforreproducibledatascience
AT eglenstephenj tensimplerulesforwritingdockerfilesforreproducibledatascience
AT headtim tensimplerulesforwritingdockerfilesforreproducibledatascience
AT hirsttony tensimplerulesforwritingdockerfilesforreproducibledatascience
AT evansbenjamind tensimplerulesforwritingdockerfilesforreproducibledatascience