Cargando…
Ten simple rules for writing Dockerfiles for reproducible data science
Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7654784/ https://www.ncbi.nlm.nih.gov/pubmed/33170857 http://dx.doi.org/10.1371/journal.pcbi.1008316 |
_version_ | 1783608116814807040 |
---|---|
author | Nüst, Daniel Sochat, Vanessa Marwick, Ben Eglen, Stephen J. Head, Tim Hirst, Tony Evans, Benjamin D. |
author_facet | Nüst, Daniel Sochat, Vanessa Marwick, Ben Eglen, Stephen J. Head, Tim Hirst, Tony Evans, Benjamin D. |
author_sort | Nüst, Daniel |
collection | PubMed |
description | Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container’s image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows. |
format | Online Article Text |
id | pubmed-7654784 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-76547842020-11-18 Ten simple rules for writing Dockerfiles for reproducible data science Nüst, Daniel Sochat, Vanessa Marwick, Ben Eglen, Stephen J. Head, Tim Hirst, Tony Evans, Benjamin D. PLoS Comput Biol Editorial Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow’s reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container’s image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows. Public Library of Science 2020-11-10 /pmc/articles/PMC7654784/ /pubmed/33170857 http://dx.doi.org/10.1371/journal.pcbi.1008316 Text en © 2020 Nüst et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Editorial Nüst, Daniel Sochat, Vanessa Marwick, Ben Eglen, Stephen J. Head, Tim Hirst, Tony Evans, Benjamin D. Ten simple rules for writing Dockerfiles for reproducible data science |
title | Ten simple rules for writing Dockerfiles for reproducible data science |
title_full | Ten simple rules for writing Dockerfiles for reproducible data science |
title_fullStr | Ten simple rules for writing Dockerfiles for reproducible data science |
title_full_unstemmed | Ten simple rules for writing Dockerfiles for reproducible data science |
title_short | Ten simple rules for writing Dockerfiles for reproducible data science |
title_sort | ten simple rules for writing dockerfiles for reproducible data science |
topic | Editorial |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7654784/ https://www.ncbi.nlm.nih.gov/pubmed/33170857 http://dx.doi.org/10.1371/journal.pcbi.1008316 |
work_keys_str_mv | AT nustdaniel tensimplerulesforwritingdockerfilesforreproducibledatascience AT sochatvanessa tensimplerulesforwritingdockerfilesforreproducibledatascience AT marwickben tensimplerulesforwritingdockerfilesforreproducibledatascience AT eglenstephenj tensimplerulesforwritingdockerfilesforreproducibledatascience AT headtim tensimplerulesforwritingdockerfilesforreproducibledatascience AT hirsttony tensimplerulesforwritingdockerfilesforreproducibledatascience AT evansbenjamind tensimplerulesforwritingdockerfilesforreproducibledatascience |