Cargando…

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

BACKGROUND: The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient repr...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahman, Md Mahfuzer, Sharker, Ratul, Biswas, Sajib, Rahman, M. Sohel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5591975/
https://www.ncbi.nlm.nih.gov/pubmed/28929105
http://dx.doi.org/10.1155/2017/6120980
_version_ 1783262822981959680
author Rahman, Md Mahfuzer
Sharker, Ratul
Biswas, Sajib
Rahman, M. Sohel
author_facet Rahman, Md Mahfuzer
Sharker, Ratul
Biswas, Sajib
Rahman, M. Sohel
author_sort Rahman, Md Mahfuzer
collection PubMed
description BACKGROUND: The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. RESULTS: In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. CONCLUSIONS: In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly.
format Online
Article
Text
id pubmed-5591975
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-55919752017-09-19 HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly Rahman, Md Mahfuzer Sharker, Ratul Biswas, Sajib Rahman, M. Sohel Int J Genomics Research Article BACKGROUND: The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. RESULTS: In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. CONCLUSIONS: In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly. Hindawi 2017 2017-08-27 /pmc/articles/PMC5591975/ /pubmed/28929105 http://dx.doi.org/10.1155/2017/6120980 Text en Copyright © 2017 Md Mahfuzer Rahman et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rahman, Md Mahfuzer
Sharker, Ratul
Biswas, Sajib
Rahman, M. Sohel
HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_full HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_fullStr HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_full_unstemmed HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_short HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
title_sort havec: an efficient de bruijn graph construction algorithm for genome assembly
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5591975/
https://www.ncbi.nlm.nih.gov/pubmed/28929105
http://dx.doi.org/10.1155/2017/6120980
work_keys_str_mv AT rahmanmdmahfuzer havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly
AT sharkerratul havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly
AT biswassajib havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly
AT rahmanmsohel havecanefficientdebruijngraphconstructionalgorithmforgenomeassembly