Cargando…
Adaptive efficient compression of genomes
Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Comp...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541066/ https://www.ncbi.nlm.nih.gov/pubmed/23146997 http://dx.doi.org/10.1186/1748-7188-7-30 |
_version_ | 1782255285820194816 |
---|---|
author | Wandelt, Sebastian Leser, Ulf |
author_facet | Wandelt, Sebastian Leser, Ulf |
author_sort | Wandelt, Sebastian |
collection | PubMed |
description | Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. |
format | Online Article Text |
id | pubmed-3541066 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35410662013-01-11 Adaptive efficient compression of genomes Wandelt, Sebastian Leser, Ulf Algorithms Mol Biol Research Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. BioMed Central 2012-11-12 /pmc/articles/PMC3541066/ /pubmed/23146997 http://dx.doi.org/10.1186/1748-7188-7-30 Text en Copyright © 2012 Wandelt and Leser; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Wandelt, Sebastian Leser, Ulf Adaptive efficient compression of genomes |
title | Adaptive efficient compression of genomes |
title_full | Adaptive efficient compression of genomes |
title_fullStr | Adaptive efficient compression of genomes |
title_full_unstemmed | Adaptive efficient compression of genomes |
title_short | Adaptive efficient compression of genomes |
title_sort | adaptive efficient compression of genomes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3541066/ https://www.ncbi.nlm.nih.gov/pubmed/23146997 http://dx.doi.org/10.1186/1748-7188-7-30 |
work_keys_str_mv | AT wandeltsebastian adaptiveefficientcompressionofgenomes AT leserulf adaptiveefficientcompressionofgenomes |