Cargando…

GABAC: an arithmetic coding solution for genomic data

MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)...

Descripción completa

Detalles Bibliográficos
Autores principales: Voges, Jan, Paridaens, Tom, Müntefering, Fabian, Mainzer, Liudmila S, Bliss, Brian, Yang, Mingyu, Ochoa, Idoia, Fostier, Jan, Ostermann, Jörn, Hernaez, Mikel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7141842/
https://www.ncbi.nlm.nih.gov/pubmed/31830243
http://dx.doi.org/10.1093/bioinformatics/btz922
_version_ 1783519269622906880
author Voges, Jan
Paridaens, Tom
Müntefering, Fabian
Mainzer, Liudmila S
Bliss, Brian
Yang, Mingyu
Ochoa, Idoia
Fostier, Jan
Ostermann, Jörn
Hernaez, Mikel
author_facet Voges, Jan
Paridaens, Tom
Müntefering, Fabian
Mainzer, Liudmila S
Bliss, Brian
Yang, Mingyu
Ochoa, Idoia
Fostier, Jan
Ostermann, Jörn
Hernaez, Mikel
author_sort Voges, Jan
collection PubMed
description MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY AND IMPLEMENTATION: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7141842
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-71418422020-04-13 GABAC: an arithmetic coding solution for genomic data Voges, Jan Paridaens, Tom Müntefering, Fabian Mainzer, Liudmila S Bliss, Brian Yang, Mingyu Ochoa, Idoia Fostier, Jan Ostermann, Jörn Hernaez, Mikel Bioinformatics Applications Notes MOTIVATION: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data. RESULTS: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM. AVAILABILITY AND IMPLEMENTATION: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-04-01 2019-12-12 /pmc/articles/PMC7141842/ /pubmed/31830243 http://dx.doi.org/10.1093/bioinformatics/btz922 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Voges, Jan
Paridaens, Tom
Müntefering, Fabian
Mainzer, Liudmila S
Bliss, Brian
Yang, Mingyu
Ochoa, Idoia
Fostier, Jan
Ostermann, Jörn
Hernaez, Mikel
GABAC: an arithmetic coding solution for genomic data
title GABAC: an arithmetic coding solution for genomic data
title_full GABAC: an arithmetic coding solution for genomic data
title_fullStr GABAC: an arithmetic coding solution for genomic data
title_full_unstemmed GABAC: an arithmetic coding solution for genomic data
title_short GABAC: an arithmetic coding solution for genomic data
title_sort gabac: an arithmetic coding solution for genomic data
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7141842/
https://www.ncbi.nlm.nih.gov/pubmed/31830243
http://dx.doi.org/10.1093/bioinformatics/btz922
work_keys_str_mv AT vogesjan gabacanarithmeticcodingsolutionforgenomicdata
AT paridaenstom gabacanarithmeticcodingsolutionforgenomicdata
AT munteferingfabian gabacanarithmeticcodingsolutionforgenomicdata
AT mainzerliudmilas gabacanarithmeticcodingsolutionforgenomicdata
AT blissbrian gabacanarithmeticcodingsolutionforgenomicdata
AT yangmingyu gabacanarithmeticcodingsolutionforgenomicdata
AT ochoaidoia gabacanarithmeticcodingsolutionforgenomicdata
AT fostierjan gabacanarithmeticcodingsolutionforgenomicdata
AT ostermannjorn gabacanarithmeticcodingsolutionforgenomicdata
AT hernaezmikel gabacanarithmeticcodingsolutionforgenomicdata