Cargando…

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as...

Descripción completa

Detalles Bibliográficos
Autores principales: Falster, Daniel S, FitzJohn, Richard G, Pennell, Matthew W, Cornwell, William K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506717/
https://www.ncbi.nlm.nih.gov/pubmed/31042286
http://dx.doi.org/10.1093/gigascience/giz035
_version_ 1783416904534196224
author Falster, Daniel S
FitzJohn, Richard G
Pennell, Matthew W
Cornwell, William K
author_facet Falster, Daniel S
FitzJohn, Richard G
Pennell, Matthew W
Cornwell, William K
author_sort Falster, Daniel S
collection PubMed
description The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost.
format Online
Article
Text
id pubmed-6506717
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65067172019-05-13 Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R Falster, Daniel S FitzJohn, Richard G Pennell, Matthew W Cornwell, William K Gigascience Technical Note The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost. Oxford University Press 2019-05-01 /pmc/articles/PMC6506717/ /pubmed/31042286 http://dx.doi.org/10.1093/gigascience/giz035 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Falster, Daniel S
FitzJohn, Richard G
Pennell, Matthew W
Cornwell, William K
Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_full Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_fullStr Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_full_unstemmed Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_short Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_sort datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into r
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506717/
https://www.ncbi.nlm.nih.gov/pubmed/31042286
http://dx.doi.org/10.1093/gigascience/giz035
work_keys_str_mv AT falsterdaniels datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor
AT fitzjohnrichardg datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor
AT pennellmattheww datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor
AT cornwellwilliamk datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor