Cargando…

Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences

SUMMARY: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)—a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Kryukov, Kirill, Ueda, Mahoko Takahashi, Nakagawa, So, Imanishi, Tadashi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761962/
https://www.ncbi.nlm.nih.gov/pubmed/30799504
http://dx.doi.org/10.1093/bioinformatics/btz144
Descripción
Sumario:SUMMARY: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)—a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd. AVAILABILITY AND IMPLEMENTATION: NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.