Cargando…
CRAM 3.1: advances in the CRAM file format
MOTIVATION: CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments. RESULTS: With Illumina data CRAM 3.1 is 7–15% smaller than the equivalent CRAM 3.0 file, and 50–70%...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8896640/ https://www.ncbi.nlm.nih.gov/pubmed/34999766 http://dx.doi.org/10.1093/bioinformatics/btac010 |
_version_ | 1784663205264490496 |
---|---|
author | Bonfield, James K |
author_facet | Bonfield, James K |
author_sort | Bonfield, James K |
collection | PubMed |
description | MOTIVATION: CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments. RESULTS: With Illumina data CRAM 3.1 is 7–15% smaller than the equivalent CRAM 3.0 file, and 50–70% smaller than the corresponding BAM file. Long-read technology shows more modest compression due to the presence of high-entropy signals. AVAILABILITY AND IMPLEMENTATION: The CRAM 3.0 specification is freely available from https://samtools.github.io/hts-specs/CRAMv3.pdf. The CRAM 3.1 improvements are available in a separate OpenSource HTScodecs library from https://github.com/samtools/htscodecs, and have been incorporated into HTSlib. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8896640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-88966402022-03-07 CRAM 3.1: advances in the CRAM file format Bonfield, James K Bioinformatics Original Papers MOTIVATION: CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments. RESULTS: With Illumina data CRAM 3.1 is 7–15% smaller than the equivalent CRAM 3.0 file, and 50–70% smaller than the corresponding BAM file. Long-read technology shows more modest compression due to the presence of high-entropy signals. AVAILABILITY AND IMPLEMENTATION: The CRAM 3.0 specification is freely available from https://samtools.github.io/hts-specs/CRAMv3.pdf. The CRAM 3.1 improvements are available in a separate OpenSource HTScodecs library from https://github.com/samtools/htscodecs, and have been incorporated into HTSlib. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-01-06 /pmc/articles/PMC8896640/ /pubmed/34999766 http://dx.doi.org/10.1093/bioinformatics/btac010 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Bonfield, James K CRAM 3.1: advances in the CRAM file format |
title | CRAM 3.1: advances in the CRAM file format |
title_full | CRAM 3.1: advances in the CRAM file format |
title_fullStr | CRAM 3.1: advances in the CRAM file format |
title_full_unstemmed | CRAM 3.1: advances in the CRAM file format |
title_short | CRAM 3.1: advances in the CRAM file format |
title_sort | cram 3.1: advances in the cram file format |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8896640/ https://www.ncbi.nlm.nih.gov/pubmed/34999766 http://dx.doi.org/10.1093/bioinformatics/btac010 |
work_keys_str_mv | AT bonfieldjamesk cram31advancesinthecramfileformat |