Cargando…

MassComp, a lossless compressor for mass spectrometry data

BACKGROUND: Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Some...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Ruochen, Chen, Xi, Ochoa, Idoia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604446/
https://www.ncbi.nlm.nih.gov/pubmed/31262247
http://dx.doi.org/10.1186/s12859-019-2962-7
_version_ 1783431714879569920
author Yang, Ruochen
Chen, Xi
Ochoa, Idoia
author_facet Yang, Ruochen
Chen, Xi
Ochoa, Idoia
author_sort Yang, Ruochen
collection PubMed
description BACKGROUND: Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Somehow surprisingly, these data are stored uncompressed, hence incurring a significant storage cost. Efficient representation of these data is therefore paramount to lessen the burden of storage and facilitate its dissemination. RESULTS: We present MassComp, a lossless compressor optimized for the numerical (m/z)-intensity pairs that account for most of the MS data. We tested MassComp on several MS data and show that it delivers on average a 46% reduction on the size of the numerical data, and up to 89%. These results correspond to an average improvement of more than 27% when compared to the general compressor gzip and of 40% when compared to the state-of-the-art numerical compressor FPC. When tested on entire files retrieved from the MassIVE repository, MassComp achieves on average a 59% size reduction. MassComp is written in C++ and freely available at https://github.com/iochoa/MassComp. CONCLUSIONS: The compression performance of MassComp demonstrates its potential to significantly reduce the footprint of MS data, and shows the benefits of designing specialized compression algorithms tailored to MS data. MassComp is an addition to the family of omics compression algorithms designed to lessen the storage burden and facilitate the exchange and dissemination of omics data.
format Online
Article
Text
id pubmed-6604446
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66044462019-07-12 MassComp, a lossless compressor for mass spectrometry data Yang, Ruochen Chen, Xi Ochoa, Idoia BMC Bioinformatics Methodology Article BACKGROUND: Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Somehow surprisingly, these data are stored uncompressed, hence incurring a significant storage cost. Efficient representation of these data is therefore paramount to lessen the burden of storage and facilitate its dissemination. RESULTS: We present MassComp, a lossless compressor optimized for the numerical (m/z)-intensity pairs that account for most of the MS data. We tested MassComp on several MS data and show that it delivers on average a 46% reduction on the size of the numerical data, and up to 89%. These results correspond to an average improvement of more than 27% when compared to the general compressor gzip and of 40% when compared to the state-of-the-art numerical compressor FPC. When tested on entire files retrieved from the MassIVE repository, MassComp achieves on average a 59% size reduction. MassComp is written in C++ and freely available at https://github.com/iochoa/MassComp. CONCLUSIONS: The compression performance of MassComp demonstrates its potential to significantly reduce the footprint of MS data, and shows the benefits of designing specialized compression algorithms tailored to MS data. MassComp is an addition to the family of omics compression algorithms designed to lessen the storage burden and facilitate the exchange and dissemination of omics data. BioMed Central 2019-07-01 /pmc/articles/PMC6604446/ /pubmed/31262247 http://dx.doi.org/10.1186/s12859-019-2962-7 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Yang, Ruochen
Chen, Xi
Ochoa, Idoia
MassComp, a lossless compressor for mass spectrometry data
title MassComp, a lossless compressor for mass spectrometry data
title_full MassComp, a lossless compressor for mass spectrometry data
title_fullStr MassComp, a lossless compressor for mass spectrometry data
title_full_unstemmed MassComp, a lossless compressor for mass spectrometry data
title_short MassComp, a lossless compressor for mass spectrometry data
title_sort masscomp, a lossless compressor for mass spectrometry data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604446/
https://www.ncbi.nlm.nih.gov/pubmed/31262247
http://dx.doi.org/10.1186/s12859-019-2962-7
work_keys_str_mv AT yangruochen masscompalosslesscompressorformassspectrometrydata
AT chenxi masscompalosslesscompressorformassspectrometrydata
AT ochoaidoia masscompalosslesscompressorformassspectrometrydata