Cargando…

Tackling the Challenges of FASTQ Referential Compression

The exponential growth of genomic data has recently motivated the development of compression algorithms to tackle the storage capacity limitations in bioinformatics centers. Referential compressors could theoretically achieve a much higher compression than their non-referential counterparts; however...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guerra, Aníbal, Lotero, Jaime, Aedo, José Édinson, Isaza, Sebastián
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2019
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376532/ https://www.ncbi.nlm.nih.gov/pubmed/30792576 http://dx.doi.org/10.1177/1177932218821373

_version_	1783395579034861568
author	Guerra, Aníbal Lotero, Jaime Aedo, José Édinson Isaza, Sebastián
author_facet	Guerra, Aníbal Lotero, Jaime Aedo, José Édinson Isaza, Sebastián
author_sort	Guerra, Aníbal
collection	PubMed
description	The exponential growth of genomic data has recently motivated the development of compression algorithms to tackle the storage capacity limitations in bioinformatics centers. Referential compressors could theoretically achieve a much higher compression than their non-referential counterparts; however, the latest tools have not been able to harness such potential yet. To reach such goal, an efficient encoding model to represent the differences between the input and the reference is needed. In this article, we introduce a novel approach for referential compression of FASTQ files. The core of our compression scheme consists of a referential compressor based on the combination of local alignments with binary encoding optimized for long reads. Here we present the algorithms and performance tests developed for our reads compression algorithm, named UdeACompress. Our compressor achieved the best results when compressing long reads and competitive compression ratios for shorter reads when compared to the best programs in the state of the art. As an added value, it also showed reasonable execution times and memory consumption, in comparison with similar tools.
format	Online Article Text
id	pubmed-6376532
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-63765322019-02-21 Tackling the Challenges of FASTQ Referential Compression Guerra, Aníbal Lotero, Jaime Aedo, José Édinson Isaza, Sebastián Bioinform Biol Insights Original Research The exponential growth of genomic data has recently motivated the development of compression algorithms to tackle the storage capacity limitations in bioinformatics centers. Referential compressors could theoretically achieve a much higher compression than their non-referential counterparts; however, the latest tools have not been able to harness such potential yet. To reach such goal, an efficient encoding model to represent the differences between the input and the reference is needed. In this article, we introduce a novel approach for referential compression of FASTQ files. The core of our compression scheme consists of a referential compressor based on the combination of local alignments with binary encoding optimized for long reads. Here we present the algorithms and performance tests developed for our reads compression algorithm, named UdeACompress. Our compressor achieved the best results when compressing long reads and competitive compression ratios for shorter reads when compared to the best programs in the state of the art. As an added value, it also showed reasonable execution times and memory consumption, in comparison with similar tools. SAGE Publications 2019-02-14 /pmc/articles/PMC6376532/ /pubmed/30792576 http://dx.doi.org/10.1177/1177932218821373 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Research Guerra, Aníbal Lotero, Jaime Aedo, José Édinson Isaza, Sebastián Tackling the Challenges of FASTQ Referential Compression
title	Tackling the Challenges of FASTQ Referential Compression
title_full	Tackling the Challenges of FASTQ Referential Compression
title_fullStr	Tackling the Challenges of FASTQ Referential Compression
title_full_unstemmed	Tackling the Challenges of FASTQ Referential Compression
title_short	Tackling the Challenges of FASTQ Referential Compression
title_sort	tackling the challenges of fastq referential compression
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6376532/ https://www.ncbi.nlm.nih.gov/pubmed/30792576 http://dx.doi.org/10.1177/1177932218821373
work_keys_str_mv	AT guerraanibal tacklingthechallengesoffastqreferentialcompression AT loterojaime tacklingthechallengesoffastqreferentialcompression AT aedojoseedinson tacklingthechallengesoffastqreferentialcompression AT isazasebastian tacklingthechallengesoffastqreferentialcompression

Tackling the Challenges of FASTQ Referential Compression

Ejemplares similares