Cargando…

Adaptive efficient compression of genomes

Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Wandelt, Sebastian, Leser, Ulf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541066/
https://www.ncbi.nlm.nih.gov/pubmed/23146997
http://dx.doi.org/10.1186/1748-7188-7-30
_version_ 1782255285820194816
author Wandelt, Sebastian
Leser, Ulf
author_facet Wandelt, Sebastian
Leser, Ulf
author_sort Wandelt, Sebastian
collection PubMed
description Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory.
format Online
Article
Text
id pubmed-3541066
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35410662013-01-11 Adaptive efficient compression of genomes Wandelt, Sebastian Leser, Ulf Algorithms Mol Biol Research Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. BioMed Central 2012-11-12 /pmc/articles/PMC3541066/ /pubmed/23146997 http://dx.doi.org/10.1186/1748-7188-7-30 Text en Copyright © 2012 Wandelt and Leser; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Wandelt, Sebastian
Leser, Ulf
Adaptive efficient compression of genomes
title Adaptive efficient compression of genomes
title_full Adaptive efficient compression of genomes
title_fullStr Adaptive efficient compression of genomes
title_full_unstemmed Adaptive efficient compression of genomes
title_short Adaptive efficient compression of genomes
title_sort adaptive efficient compression of genomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541066/
https://www.ncbi.nlm.nih.gov/pubmed/23146997
http://dx.doi.org/10.1186/1748-7188-7-30
work_keys_str_mv AT wandeltsebastian adaptiveefficientcompressionofgenomes
AT leserulf adaptiveefficientcompressionofgenomes