Cargando…
Compression of FASTQ and SAM Format Sequencing Data
Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzco...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606433/ https://www.ncbi.nlm.nih.gov/pubmed/23533605 http://dx.doi.org/10.1371/journal.pone.0059190 |
_version_ | 1782264010995924992 |
---|---|
author | Bonfield, James K. Mahoney, Matthew V. |
author_facet | Bonfield, James K. Mahoney, Matthew V. |
author_sort | Bonfield, James K. |
collection | PubMed |
description | Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/. |
format | Online Article Text |
id | pubmed-3606433 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-36064332013-03-26 Compression of FASTQ and SAM Format Sequencing Data Bonfield, James K. Mahoney, Matthew V. PLoS One Research Article Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/. Public Library of Science 2013-03-22 /pmc/articles/PMC3606433/ /pubmed/23533605 http://dx.doi.org/10.1371/journal.pone.0059190 Text en © 2013 Bonfield, Mahoney http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bonfield, James K. Mahoney, Matthew V. Compression of FASTQ and SAM Format Sequencing Data |
title | Compression of FASTQ and SAM Format Sequencing Data |
title_full | Compression of FASTQ and SAM Format Sequencing Data |
title_fullStr | Compression of FASTQ and SAM Format Sequencing Data |
title_full_unstemmed | Compression of FASTQ and SAM Format Sequencing Data |
title_short | Compression of FASTQ and SAM Format Sequencing Data |
title_sort | compression of fastq and sam format sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606433/ https://www.ncbi.nlm.nih.gov/pubmed/23533605 http://dx.doi.org/10.1371/journal.pone.0059190 |
work_keys_str_mv | AT bonfieldjamesk compressionoffastqandsamformatsequencingdata AT mahoneymatthewv compressionoffastqandsamformatsequencingdata |