Cargando…

BUSZ: compressed BUS files

SUMMARY: We describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than gzip, at significantly faster compression and decompression speeds. We evaluated our algorithm on 533 BUS files from scRNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Einarsson, Pétur Helgi, Melsted, Páll
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10185401/
https://www.ncbi.nlm.nih.gov/pubmed/37129540
http://dx.doi.org/10.1093/bioinformatics/btad295
_version_ 1785042348635324416
author Einarsson, Pétur Helgi
Melsted, Páll
author_facet Einarsson, Pétur Helgi
Melsted, Páll
author_sort Einarsson, Pétur Helgi
collection PubMed
description SUMMARY: We describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than gzip, at significantly faster compression and decompression speeds. We evaluated our algorithm on 533 BUS files from scRNA-seq experiments with a total size of 1TB. Our compression is 2.2× faster than the fastest gzip option 35% slower than the fastest zstd option and results in 1.5× smaller files than both methods. This amounts to an 8.3× reduction in the file size, resulting in a compressed size of 122GB for the dataset. AVAILABILITY AND IMPLEMENTATION: A complete description of the format is available at https://github.com/BUStools/BUSZ-format and an implementation at https://github.com/BUStools/bustools. The code to reproduce the results of this article is available at https://github.com/pmelsted/BUSZ_paper.
format Online
Article
Text
id pubmed-10185401
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101854012023-05-16 BUSZ: compressed BUS files Einarsson, Pétur Helgi Melsted, Páll Bioinformatics Applications Note SUMMARY: We describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than gzip, at significantly faster compression and decompression speeds. We evaluated our algorithm on 533 BUS files from scRNA-seq experiments with a total size of 1TB. Our compression is 2.2× faster than the fastest gzip option 35% slower than the fastest zstd option and results in 1.5× smaller files than both methods. This amounts to an 8.3× reduction in the file size, resulting in a compressed size of 122GB for the dataset. AVAILABILITY AND IMPLEMENTATION: A complete description of the format is available at https://github.com/BUStools/BUSZ-format and an implementation at https://github.com/BUStools/bustools. The code to reproduce the results of this article is available at https://github.com/pmelsted/BUSZ_paper. Oxford University Press 2023-05-02 /pmc/articles/PMC10185401/ /pubmed/37129540 http://dx.doi.org/10.1093/bioinformatics/btad295 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Einarsson, Pétur Helgi
Melsted, Páll
BUSZ: compressed BUS files
title BUSZ: compressed BUS files
title_full BUSZ: compressed BUS files
title_fullStr BUSZ: compressed BUS files
title_full_unstemmed BUSZ: compressed BUS files
title_short BUSZ: compressed BUS files
title_sort busz: compressed bus files
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10185401/
https://www.ncbi.nlm.nih.gov/pubmed/37129540
http://dx.doi.org/10.1093/bioinformatics/btad295
work_keys_str_mv AT einarssonpeturhelgi buszcompressedbusfiles
AT melstedpall buszcompressedbusfiles