Cargando…

The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5

The evolution of data exchange in Mass Spectrometry spans decades and has ranged from human‐readable text files representing individual scans or collections thereof (McDonald et al., 2004) through the official standard XML‐based (Harold, Means, & Udemadu, 2005) data interchange standard (Deutsch...

Descripción completa

Detalles Bibliográficos
Autores principales: Askenazi, Manor, Ben Hamidane, Hisham, Graumann, Johannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6088231/
https://www.ncbi.nlm.nih.gov/pubmed/27741559
http://dx.doi.org/10.1002/mas.21522
_version_ 1783346809367691264
author Askenazi, Manor
Ben Hamidane, Hisham
Graumann, Johannes
author_facet Askenazi, Manor
Ben Hamidane, Hisham
Graumann, Johannes
author_sort Askenazi, Manor
collection PubMed
description The evolution of data exchange in Mass Spectrometry spans decades and has ranged from human‐readable text files representing individual scans or collections thereof (McDonald et al., 2004) through the official standard XML‐based (Harold, Means, & Udemadu, 2005) data interchange standard (Deutsch, 2012), to increasingly compressed (Teleman et al., 2014) variants of this standard sometimes requiring purely binary adjunct files (Römpp et al., 2011). While the desire to maintain even partial human readability is understandable, the inherent mismatch between XML's textual and irregular format relative to the numeric and highly regular nature of actual spectral data, along with the explosive growth in dataset scales and the resulting need for efficient (binary and indexed) access has led to a phenomenon referred to as “technical drift” (Davis, 2013). While the drift is being continuously corrected using adjunct formats, compression schemes, and programs (Röst et al., 2015), we propose that the future of Mass Spectrometry Exchange Formats lies in the continued reliance and development of the PSI‐MS (Mayer et al., 2014) controlled vocabulary, along with an expedited shift to an alternative, thriving and well‐supported ecosystem for scientific data‐exchange, storage, and access in binary form, namely that of HDF5 (Koranne, 2011). Indeed, pioneering efforts to leverage this universal, binary, and hierarchical data‐format have already been published (Wilhelm et al., 2012; Rübel et al., 2013) though they have under‐utilized self‐description, a key property shared by HDF5 and XML. We demonstrate that a straightforward usage of plain (“vanilla”) HDF5 yields immediate returns including, but not limited to, highly efficient data access, platform independent data viewers, a variety of libraries (Collette, 2014) for data retrieval and manipulation in many programming languages and remote data access through comprehensive RESTful data‐servers. © 2016 The Authors. Mass Spectrometry Reviews published by Wiley Periodicals, Inc. Mass Spec Rev 36:668–673, 2017
format Online
Article
Text
id pubmed-6088231
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-60882312018-08-17 The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5 Askenazi, Manor Ben Hamidane, Hisham Graumann, Johannes Mass Spectrom Rev Review Articles The evolution of data exchange in Mass Spectrometry spans decades and has ranged from human‐readable text files representing individual scans or collections thereof (McDonald et al., 2004) through the official standard XML‐based (Harold, Means, & Udemadu, 2005) data interchange standard (Deutsch, 2012), to increasingly compressed (Teleman et al., 2014) variants of this standard sometimes requiring purely binary adjunct files (Römpp et al., 2011). While the desire to maintain even partial human readability is understandable, the inherent mismatch between XML's textual and irregular format relative to the numeric and highly regular nature of actual spectral data, along with the explosive growth in dataset scales and the resulting need for efficient (binary and indexed) access has led to a phenomenon referred to as “technical drift” (Davis, 2013). While the drift is being continuously corrected using adjunct formats, compression schemes, and programs (Röst et al., 2015), we propose that the future of Mass Spectrometry Exchange Formats lies in the continued reliance and development of the PSI‐MS (Mayer et al., 2014) controlled vocabulary, along with an expedited shift to an alternative, thriving and well‐supported ecosystem for scientific data‐exchange, storage, and access in binary form, namely that of HDF5 (Koranne, 2011). Indeed, pioneering efforts to leverage this universal, binary, and hierarchical data‐format have already been published (Wilhelm et al., 2012; Rübel et al., 2013) though they have under‐utilized self‐description, a key property shared by HDF5 and XML. We demonstrate that a straightforward usage of plain (“vanilla”) HDF5 yields immediate returns including, but not limited to, highly efficient data access, platform independent data viewers, a variety of libraries (Collette, 2014) for data retrieval and manipulation in many programming languages and remote data access through comprehensive RESTful data‐servers. © 2016 The Authors. Mass Spectrometry Reviews published by Wiley Periodicals, Inc. Mass Spec Rev 36:668–673, 2017 John Wiley and Sons Inc. 2016-10-14 2017 /pmc/articles/PMC6088231/ /pubmed/27741559 http://dx.doi.org/10.1002/mas.21522 Text en © 2016 by The Authors. Mass Spectrometry Reviews published by Wiley Periodicals, Inc. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review Articles
Askenazi, Manor
Ben Hamidane, Hisham
Graumann, Johannes
The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
title The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
title_full The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
title_fullStr The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
title_full_unstemmed The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
title_short The arc of Mass Spectrometry Exchange Formats is long, but it bends toward HDF5
title_sort arc of mass spectrometry exchange formats is long, but it bends toward hdf5
topic Review Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6088231/
https://www.ncbi.nlm.nih.gov/pubmed/27741559
http://dx.doi.org/10.1002/mas.21522
work_keys_str_mv AT askenazimanor thearcofmassspectrometryexchangeformatsislongbutitbendstowardhdf5
AT benhamidanehisham thearcofmassspectrometryexchangeformatsislongbutitbendstowardhdf5
AT graumannjohannes thearcofmassspectrometryexchangeformatsislongbutitbendstowardhdf5
AT askenazimanor arcofmassspectrometryexchangeformatsislongbutitbendstowardhdf5
AT benhamidanehisham arcofmassspectrometryexchangeformatsislongbutitbendstowardhdf5
AT graumannjohannes arcofmassspectrometryexchangeformatsislongbutitbendstowardhdf5