Cargando…

Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format

Natural sciences generate an increasing amount of data in a wide range of formats developed by different research groups and commercial companies. At the same time there is a growing desire to share data along with publications in order to enable reproducible research. Open formats have publicly ava...

Descripción completa

Detalles Bibliográficos
Autores principales: Dragly, Svenn-Arne, Hobbi Mobarhan, Milad, Lepperød, Mikkel E., Tennøe, Simen, Fyhn, Marianne, Hafting, Torkel, Malthe-Sørenssen, Anders
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909058/
https://www.ncbi.nlm.nih.gov/pubmed/29706879
http://dx.doi.org/10.3389/fninf.2018.00016
_version_ 1783315824882221056
author Dragly, Svenn-Arne
Hobbi Mobarhan, Milad
Lepperød, Mikkel E.
Tennøe, Simen
Fyhn, Marianne
Hafting, Torkel
Malthe-Sørenssen, Anders
author_facet Dragly, Svenn-Arne
Hobbi Mobarhan, Milad
Lepperød, Mikkel E.
Tennøe, Simen
Fyhn, Marianne
Hafting, Torkel
Malthe-Sørenssen, Anders
author_sort Dragly, Svenn-Arne
collection PubMed
description Natural sciences generate an increasing amount of data in a wide range of formats developed by different research groups and commercial companies. At the same time there is a growing desire to share data along with publications in order to enable reproducible research. Open formats have publicly available specifications which facilitate data sharing and reproducible research. Hierarchical Data Format 5 (HDF5) is a popular open format widely used in neuroscience, often as a foundation for other, more specialized formats. However, drawbacks related to HDF5's complex specification have initiated a discussion for an improved replacement. We propose a novel alternative, the Experimental Directory Structure (Exdir), an open specification for data storage in experimental pipelines which amends drawbacks associated with HDF5 while retaining its advantages. HDF5 stores data and metadata in a hierarchy within a complex binary file which, among other things, is not human-readable, not optimal for version control systems, and lacks support for easy access to raw data from external applications. Exdir, on the other hand, uses file system directories to represent the hierarchy, with metadata stored in human-readable YAML files, datasets stored in binary NumPy files, and raw data stored directly in subdirectories. Furthermore, storing data in multiple files makes it easier to track for version control systems. Exdir is not a file format in itself, but a specification for organizing files in a directory structure. Exdir uses the same abstractions as HDF5 and is compatible with the HDF5 Abstract Data Model. Several research groups are already using data stored in a directory hierarchy as an alternative to HDF5, but no common standard exists. This complicates and limits the opportunity for data sharing and development of common tools for reading, writing, and analyzing data. Exdir facilitates improved data storage, data sharing, reproducible research, and novel insight from interdisciplinary collaboration. With the publication of Exdir, we invite the scientific community to join the development to create an open specification that will serve as many needs as possible and as a foundation for open access to and exchange of data.
format Online
Article
Text
id pubmed-5909058
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-59090582018-04-27 Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format Dragly, Svenn-Arne Hobbi Mobarhan, Milad Lepperød, Mikkel E. Tennøe, Simen Fyhn, Marianne Hafting, Torkel Malthe-Sørenssen, Anders Front Neuroinform Neuroscience Natural sciences generate an increasing amount of data in a wide range of formats developed by different research groups and commercial companies. At the same time there is a growing desire to share data along with publications in order to enable reproducible research. Open formats have publicly available specifications which facilitate data sharing and reproducible research. Hierarchical Data Format 5 (HDF5) is a popular open format widely used in neuroscience, often as a foundation for other, more specialized formats. However, drawbacks related to HDF5's complex specification have initiated a discussion for an improved replacement. We propose a novel alternative, the Experimental Directory Structure (Exdir), an open specification for data storage in experimental pipelines which amends drawbacks associated with HDF5 while retaining its advantages. HDF5 stores data and metadata in a hierarchy within a complex binary file which, among other things, is not human-readable, not optimal for version control systems, and lacks support for easy access to raw data from external applications. Exdir, on the other hand, uses file system directories to represent the hierarchy, with metadata stored in human-readable YAML files, datasets stored in binary NumPy files, and raw data stored directly in subdirectories. Furthermore, storing data in multiple files makes it easier to track for version control systems. Exdir is not a file format in itself, but a specification for organizing files in a directory structure. Exdir uses the same abstractions as HDF5 and is compatible with the HDF5 Abstract Data Model. Several research groups are already using data stored in a directory hierarchy as an alternative to HDF5, but no common standard exists. This complicates and limits the opportunity for data sharing and development of common tools for reading, writing, and analyzing data. Exdir facilitates improved data storage, data sharing, reproducible research, and novel insight from interdisciplinary collaboration. With the publication of Exdir, we invite the scientific community to join the development to create an open specification that will serve as many needs as possible and as a foundation for open access to and exchange of data. Frontiers Media S.A. 2018-04-13 /pmc/articles/PMC5909058/ /pubmed/29706879 http://dx.doi.org/10.3389/fninf.2018.00016 Text en Copyright © 2018 Dragly, Hobbi Mobarhan, Lepperød, Tennøe, Fyhn, Hafting and Malthe-Sørenssen. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Dragly, Svenn-Arne
Hobbi Mobarhan, Milad
Lepperød, Mikkel E.
Tennøe, Simen
Fyhn, Marianne
Hafting, Torkel
Malthe-Sørenssen, Anders
Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
title Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
title_full Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
title_fullStr Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
title_full_unstemmed Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
title_short Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format
title_sort experimental directory structure (exdir): an alternative to hdf5 without introducing a new file format
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909058/
https://www.ncbi.nlm.nih.gov/pubmed/29706879
http://dx.doi.org/10.3389/fninf.2018.00016
work_keys_str_mv AT draglysvennarne experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat
AT hobbimobarhanmilad experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat
AT lepperødmikkele experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat
AT tennøesimen experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat
AT fyhnmarianne experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat
AT haftingtorkel experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat
AT malthesørenssenanders experimentaldirectorystructureexdiranalternativetohdf5withoutintroducinganewfileformat