Cargando…

Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences

SUMMARY: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)—a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Kryukov, Kirill, Ueda, Mahoko Takahashi, Nakagawa, So, Imanishi, Tadashi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761962/
https://www.ncbi.nlm.nih.gov/pubmed/30799504
http://dx.doi.org/10.1093/bioinformatics/btz144
_version_ 1783454131405455360
author Kryukov, Kirill
Ueda, Mahoko Takahashi
Nakagawa, So
Imanishi, Tadashi
author_facet Kryukov, Kirill
Ueda, Mahoko Takahashi
Nakagawa, So
Imanishi, Tadashi
author_sort Kryukov, Kirill
collection PubMed
description SUMMARY: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)—a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd. AVAILABILITY AND IMPLEMENTATION: NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6761962
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67619622019-10-02 Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences Kryukov, Kirill Ueda, Mahoko Takahashi Nakagawa, So Imanishi, Tadashi Bioinformatics Applications Notes SUMMARY: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)—a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd. AVAILABILITY AND IMPLEMENTATION: NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-10-01 2019-02-25 /pmc/articles/PMC6761962/ /pubmed/30799504 http://dx.doi.org/10.1093/bioinformatics/btz144 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Kryukov, Kirill
Ueda, Mahoko Takahashi
Nakagawa, So
Imanishi, Tadashi
Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences
title Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences
title_full Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences
title_fullStr Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences
title_full_unstemmed Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences
title_short Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences
title_sort nucleotide archival format (naf) enables efficient lossless reference-free compression of dna sequences
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761962/
https://www.ncbi.nlm.nih.gov/pubmed/30799504
http://dx.doi.org/10.1093/bioinformatics/btz144
work_keys_str_mv AT kryukovkirill nucleotidearchivalformatnafenablesefficientlosslessreferencefreecompressionofdnasequences
AT uedamahokotakahashi nucleotidearchivalformatnafenablesefficientlosslessreferencefreecompressionofdnasequences
AT nakagawaso nucleotidearchivalformatnafenablesefficientlosslessreferencefreecompressionofdnasequences
AT imanishitadashi nucleotidearchivalformatnafenablesefficientlosslessreferencefreecompressionofdnasequences