Cargando…
MassComp, a lossless compressor for mass spectrometry data
BACKGROUND: Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Some...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604446/ https://www.ncbi.nlm.nih.gov/pubmed/31262247 http://dx.doi.org/10.1186/s12859-019-2962-7 |
_version_ | 1783431714879569920 |
---|---|
author | Yang, Ruochen Chen, Xi Ochoa, Idoia |
author_facet | Yang, Ruochen Chen, Xi Ochoa, Idoia |
author_sort | Yang, Ruochen |
collection | PubMed |
description | BACKGROUND: Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Somehow surprisingly, these data are stored uncompressed, hence incurring a significant storage cost. Efficient representation of these data is therefore paramount to lessen the burden of storage and facilitate its dissemination. RESULTS: We present MassComp, a lossless compressor optimized for the numerical (m/z)-intensity pairs that account for most of the MS data. We tested MassComp on several MS data and show that it delivers on average a 46% reduction on the size of the numerical data, and up to 89%. These results correspond to an average improvement of more than 27% when compared to the general compressor gzip and of 40% when compared to the state-of-the-art numerical compressor FPC. When tested on entire files retrieved from the MassIVE repository, MassComp achieves on average a 59% size reduction. MassComp is written in C++ and freely available at https://github.com/iochoa/MassComp. CONCLUSIONS: The compression performance of MassComp demonstrates its potential to significantly reduce the footprint of MS data, and shows the benefits of designing specialized compression algorithms tailored to MS data. MassComp is an addition to the family of omics compression algorithms designed to lessen the storage burden and facilitate the exchange and dissemination of omics data. |
format | Online Article Text |
id | pubmed-6604446 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-66044462019-07-12 MassComp, a lossless compressor for mass spectrometry data Yang, Ruochen Chen, Xi Ochoa, Idoia BMC Bioinformatics Methodology Article BACKGROUND: Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Somehow surprisingly, these data are stored uncompressed, hence incurring a significant storage cost. Efficient representation of these data is therefore paramount to lessen the burden of storage and facilitate its dissemination. RESULTS: We present MassComp, a lossless compressor optimized for the numerical (m/z)-intensity pairs that account for most of the MS data. We tested MassComp on several MS data and show that it delivers on average a 46% reduction on the size of the numerical data, and up to 89%. These results correspond to an average improvement of more than 27% when compared to the general compressor gzip and of 40% when compared to the state-of-the-art numerical compressor FPC. When tested on entire files retrieved from the MassIVE repository, MassComp achieves on average a 59% size reduction. MassComp is written in C++ and freely available at https://github.com/iochoa/MassComp. CONCLUSIONS: The compression performance of MassComp demonstrates its potential to significantly reduce the footprint of MS data, and shows the benefits of designing specialized compression algorithms tailored to MS data. MassComp is an addition to the family of omics compression algorithms designed to lessen the storage burden and facilitate the exchange and dissemination of omics data. BioMed Central 2019-07-01 /pmc/articles/PMC6604446/ /pubmed/31262247 http://dx.doi.org/10.1186/s12859-019-2962-7 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Yang, Ruochen Chen, Xi Ochoa, Idoia MassComp, a lossless compressor for mass spectrometry data |
title | MassComp, a lossless compressor for mass spectrometry data |
title_full | MassComp, a lossless compressor for mass spectrometry data |
title_fullStr | MassComp, a lossless compressor for mass spectrometry data |
title_full_unstemmed | MassComp, a lossless compressor for mass spectrometry data |
title_short | MassComp, a lossless compressor for mass spectrometry data |
title_sort | masscomp, a lossless compressor for mass spectrometry data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604446/ https://www.ncbi.nlm.nih.gov/pubmed/31262247 http://dx.doi.org/10.1186/s12859-019-2962-7 |
work_keys_str_mv | AT yangruochen masscompalosslesscompressorformassspectrometrydata AT chenxi masscompalosslesscompressorformassspectrometrydata AT ochoaidoia masscompalosslesscompressorformassspectrometrydata |