Cargando…
Space-efficient and exact de Bruijn graph representation based on a Bloom filter
BACKGROUND: The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memo...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848682/ https://www.ncbi.nlm.nih.gov/pubmed/24040893 http://dx.doi.org/10.1186/1748-7188-8-22 |
_version_ | 1782293801161719808 |
---|---|
author | Chikhi, Rayan Rizk, Guillaume |
author_facet | Chikhi, Rayan Rizk, Guillaume |
author_sort | Chikhi, Rayan |
collection | PubMed |
description | BACKGROUND: The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB). RESULTS: We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. CONCLUSIONS: An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours. |
format | Online Article Text |
id | pubmed-3848682 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38486822013-12-05 Space-efficient and exact de Bruijn graph representation based on a Bloom filter Chikhi, Rayan Rizk, Guillaume Algorithms Mol Biol Research BACKGROUND: The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (≥30 GB). RESULTS: We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. CONCLUSIONS: An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours. BioMed Central 2013-09-16 /pmc/articles/PMC3848682/ /pubmed/24040893 http://dx.doi.org/10.1186/1748-7188-8-22 Text en Copyright © 2013 Chikhi and Rizk; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Chikhi, Rayan Rizk, Guillaume Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
title | Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
title_full | Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
title_fullStr | Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
title_full_unstemmed | Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
title_short | Space-efficient and exact de Bruijn graph representation based on a Bloom filter |
title_sort | space-efficient and exact de bruijn graph representation based on a bloom filter |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848682/ https://www.ncbi.nlm.nih.gov/pubmed/24040893 http://dx.doi.org/10.1186/1748-7188-8-22 |
work_keys_str_mv | AT chikhirayan spaceefficientandexactdebruijngraphrepresentationbasedonabloomfilter AT rizkguillaume spaceefficientandexactdebruijngraphrepresentationbasedonabloomfilter |