Cargando…

Exploiting sparseness in de novo genome assembly

BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the obse...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Chengxi, Ma, Zhanshan Sam, Cannon, Charles H, Pop, Mihai, Yu, Douglas W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369186/
https://www.ncbi.nlm.nih.gov/pubmed/22537038
http://dx.doi.org/10.1186/1471-2105-13-S6-S1
_version_ 1782235038784421888
author Ye, Chengxi
Ma, Zhanshan Sam
Cannon, Charles H
Pop, Mihai
Yu, Douglas W
author_facet Ye, Chengxi
Ma, Zhanshan Sam
Cannon, Charles H
Pop, Mihai
Yu, Douglas W
author_sort Ye, Chengxi
collection PubMed
description BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. RESULTS: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.
format Online
Article
Text
id pubmed-3369186
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33691862012-06-07 Exploiting sparseness in de novo genome assembly Ye, Chengxi Ma, Zhanshan Sam Cannon, Charles H Pop, Mihai Yu, Douglas W BMC Bioinformatics Proceedings BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. RESULTS: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers. BioMed Central 2012-04-19 /pmc/articles/PMC3369186/ /pubmed/22537038 http://dx.doi.org/10.1186/1471-2105-13-S6-S1 Text en Copyright ©2012 Ye et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Ye, Chengxi
Ma, Zhanshan Sam
Cannon, Charles H
Pop, Mihai
Yu, Douglas W
Exploiting sparseness in de novo genome assembly
title Exploiting sparseness in de novo genome assembly
title_full Exploiting sparseness in de novo genome assembly
title_fullStr Exploiting sparseness in de novo genome assembly
title_full_unstemmed Exploiting sparseness in de novo genome assembly
title_short Exploiting sparseness in de novo genome assembly
title_sort exploiting sparseness in de novo genome assembly
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369186/
https://www.ncbi.nlm.nih.gov/pubmed/22537038
http://dx.doi.org/10.1186/1471-2105-13-S6-S1
work_keys_str_mv AT yechengxi exploitingsparsenessindenovogenomeassembly
AT mazhanshansam exploitingsparsenessindenovogenomeassembly
AT cannoncharlesh exploitingsparsenessindenovogenomeassembly
AT popmihai exploitingsparsenessindenovogenomeassembly
AT yudouglasw exploitingsparsenessindenovogenomeassembly