Cargando…

mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements

[Image: see text] With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the n...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhamber, Ranjeet S., Jankevics, Andris, Deutsch, Eric W., Jones, Andrew R., Dowsey, Andrew W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2020
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871438/
https://www.ncbi.nlm.nih.gov/pubmed/32864978
http://dx.doi.org/10.1021/acs.jproteome.0c00192
_version_ 1783649008978231296
author Bhamber, Ranjeet S.
Jankevics, Andris
Deutsch, Eric W.
Jones, Andrew R.
Dowsey, Andrew W.
author_facet Bhamber, Ranjeet S.
Jankevics, Andris
Deutsch, Eric W.
Jones, Andrew R.
Dowsey, Andrew W.
author_sort Bhamber, Ranjeet S.
collection PubMed
description [Image: see text] With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format “mzMLb” that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb’s design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided.
format Online
Article
Text
id pubmed-7871438
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-78714382021-02-10 mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements Bhamber, Ranjeet S. Jankevics, Andris Deutsch, Eric W. Jones, Andrew R. Dowsey, Andrew W. J Proteome Res [Image: see text] With ever-increasing amounts of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer volume of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data analysis pipelines. The Proteomics Standards Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format “mzMLb” that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb’s design adheres to both HDF5 and NetCDF4 standard implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A reference implementation within the established ProteoWizard toolkit is provided. American Chemical Society 2020-08-31 2021-01-01 /pmc/articles/PMC7871438/ /pubmed/32864978 http://dx.doi.org/10.1021/acs.jproteome.0c00192 Text en Made available through a Creative Commons CC-BY License (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html)
spellingShingle Bhamber, Ranjeet S.
Jankevics, Andris
Deutsch, Eric W.
Jones, Andrew R.
Dowsey, Andrew W.
mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements
title mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements
title_full mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements
title_fullStr mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements
title_full_unstemmed mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements
title_short mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements
title_sort mzmlb: a future-proof raw mass spectrometry data format based on standards-compliant mzml and optimized for speed and storage requirements
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871438/
https://www.ncbi.nlm.nih.gov/pubmed/32864978
http://dx.doi.org/10.1021/acs.jproteome.0c00192
work_keys_str_mv AT bhamberranjeets mzmlbafutureproofrawmassspectrometrydataformatbasedonstandardscompliantmzmlandoptimizedforspeedandstoragerequirements
AT jankevicsandris mzmlbafutureproofrawmassspectrometrydataformatbasedonstandardscompliantmzmlandoptimizedforspeedandstoragerequirements
AT deutschericw mzmlbafutureproofrawmassspectrometrydataformatbasedonstandardscompliantmzmlandoptimizedforspeedandstoragerequirements
AT jonesandrewr mzmlbafutureproofrawmassspectrometrydataformatbasedonstandardscompliantmzmlandoptimizedforspeedandstoragerequirements
AT dowseyandreww mzmlbafutureproofrawmassspectrometrydataformatbasedonstandardscompliantmzmlandoptimizedforspeedandstoragerequirements