Cargando…

TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data

High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established loss...

Descripción completa

Detalles Bibliográficos
Autores principales: Matinyan, Senik, Abrahams, Jan Pieter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626653/
https://www.ncbi.nlm.nih.gov/pubmed/37743849
http://dx.doi.org/10.1107/S205327332300760X
_version_ 1785131380253917184
author Matinyan, Senik
Abrahams, Jan Pieter
author_facet Matinyan, Senik
Abrahams, Jan Pieter
author_sort Matinyan, Senik
collection PubMed
description High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
format Online
Article
Text
id pubmed-10626653
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-106266532023-11-07 TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data Matinyan, Senik Abrahams, Jan Pieter Acta Crystallogr A Found Adv Research Papers High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license. International Union of Crystallography 2023-09-25 /pmc/articles/PMC10626653/ /pubmed/37743849 http://dx.doi.org/10.1107/S205327332300760X Text en © Matinyan and Abrahams 2023 https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
spellingShingle Research Papers
Matinyan, Senik
Abrahams, Jan Pieter
TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
title TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
title_full TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
title_fullStr TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
title_full_unstemmed TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
title_short TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
title_sort terse/prolix (trpx) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-em data
topic Research Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626653/
https://www.ncbi.nlm.nih.gov/pubmed/37743849
http://dx.doi.org/10.1107/S205327332300760X
work_keys_str_mv AT matinyansenik terseprolixtrpxanewalgorithmforfastandlosslesscompressionanddecompressionofdiffractionandcryoemdata
AT abrahamsjanpieter terseprolixtrpxanewalgorithmforfastandlosslesscompressionanddecompressionofdiffractionandcryoemdata