Cargando…

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Falster, Daniel S, FitzJohn, Richard G, Pennell, Matthew W, Cornwell, William K
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506717/ https://www.ncbi.nlm.nih.gov/pubmed/31042286 http://dx.doi.org/10.1093/gigascience/giz035

_version_	1783416904534196224
author	Falster, Daniel S FitzJohn, Richard G Pennell, Matthew W Cornwell, William K
author_facet	Falster, Daniel S FitzJohn, Richard G Pennell, Matthew W Cornwell, William K
author_sort	Falster, Daniel S
collection	PubMed
description	The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost.
format	Online Article Text
id	pubmed-6506717
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-65067172019-05-13 Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R Falster, Daniel S FitzJohn, Richard G Pennell, Matthew W Cornwell, William K Gigascience Technical Note The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost. Oxford University Press 2019-05-01 /pmc/articles/PMC6506717/ /pubmed/31042286 http://dx.doi.org/10.1093/gigascience/giz035 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Falster, Daniel S FitzJohn, Richard G Pennell, Matthew W Cornwell, William K Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title	Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_full	Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_fullStr	Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_full_unstemmed	Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_short	Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R
title_sort	datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into r
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506717/ https://www.ncbi.nlm.nih.gov/pubmed/31042286 http://dx.doi.org/10.1093/gigascience/giz035
work_keys_str_mv	AT falsterdaniels datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor AT fitzjohnrichardg datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor AT pennellmattheww datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor AT cornwellwilliamk datastorraworkflowandpackagefordeliveringsuccessiveversionsofevolvingdatadirectlyintor

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R

Ejemplares similares