Cargando…
GABAC: an arithmetic coding solution for genomic data
MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7141842/ https://www.ncbi.nlm.nih.gov/pubmed/31830243 http://dx.doi.org/10.1093/bioinformatics/btz922 |
_version_ | 1783519269622906880 |
---|---|
author | Voges, Jan Paridaens, Tom Müntefering, Fabian Mainzer, Liudmila S Bliss, Brian Yang, Mingyu Ochoa, Idoia Fostier, Jan Ostermann, Jörn Hernaez, Mikel |
author_facet | Voges, Jan Paridaens, Tom Müntefering, Fabian Mainzer, Liudmila S Bliss, Brian Yang, Mingyu Ochoa, Idoia Fostier, Jan Ostermann, Jörn Hernaez, Mikel |
author_sort | Voges, Jan |
collection | PubMed |
description | MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY AND IMPLEMENTATION: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7141842 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-71418422020-04-13 GABAC: an arithmetic coding solution for genomic data Voges, Jan Paridaens, Tom Müntefering, Fabian Mainzer, Liudmila S Bliss, Brian Yang, Mingyu Ochoa, Idoia Fostier, Jan Ostermann, Jörn Hernaez, Mikel Bioinformatics Applications Notes MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY AND IMPLEMENTATION: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-04-01 2019-12-12 /pmc/articles/PMC7141842/ /pubmed/31830243 http://dx.doi.org/10.1093/bioinformatics/btz922 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Voges, Jan Paridaens, Tom Müntefering, Fabian Mainzer, Liudmila S Bliss, Brian Yang, Mingyu Ochoa, Idoia Fostier, Jan Ostermann, Jörn Hernaez, Mikel GABAC: an arithmetic coding solution for genomic data |
title | GABAC: an arithmetic coding solution for genomic data |
title_full | GABAC: an arithmetic coding solution for genomic data |
title_fullStr | GABAC: an arithmetic coding solution for genomic data |
title_full_unstemmed | GABAC: an arithmetic coding solution for genomic data |
title_short | GABAC: an arithmetic coding solution for genomic data |
title_sort | gabac: an arithmetic coding solution for genomic data |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7141842/ https://www.ncbi.nlm.nih.gov/pubmed/31830243 http://dx.doi.org/10.1093/bioinformatics/btz922 |
work_keys_str_mv | AT vogesjan gabacanarithmeticcodingsolutionforgenomicdata AT paridaenstom gabacanarithmeticcodingsolutionforgenomicdata AT munteferingfabian gabacanarithmeticcodingsolutionforgenomicdata AT mainzerliudmilas gabacanarithmeticcodingsolutionforgenomicdata AT blissbrian gabacanarithmeticcodingsolutionforgenomicdata AT yangmingyu gabacanarithmeticcodingsolutionforgenomicdata AT ochoaidoia gabacanarithmeticcodingsolutionforgenomicdata AT fostierjan gabacanarithmeticcodingsolutionforgenomicdata AT ostermannjorn gabacanarithmeticcodingsolutionforgenomicdata AT hernaezmikel gabacanarithmeticcodingsolutionforgenomicdata |