Cargando…
Exploiting sparseness in de novo genome assembly
BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the obse...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369186/ https://www.ncbi.nlm.nih.gov/pubmed/22537038 http://dx.doi.org/10.1186/1471-2105-13-S6-S1 |
_version_ | 1782235038784421888 |
---|---|
author | Ye, Chengxi Ma, Zhanshan Sam Cannon, Charles H Pop, Mihai Yu, Douglas W |
author_facet | Ye, Chengxi Ma, Zhanshan Sam Cannon, Charles H Pop, Mihai Yu, Douglas W |
author_sort | Ye, Chengxi |
collection | PubMed |
description | BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. RESULTS: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers. |
format | Online Article Text |
id | pubmed-3369186 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33691862012-06-07 Exploiting sparseness in de novo genome assembly Ye, Chengxi Ma, Zhanshan Sam Cannon, Charles H Pop, Mihai Yu, Douglas W BMC Bioinformatics Proceedings BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. RESULTS: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers. BioMed Central 2012-04-19 /pmc/articles/PMC3369186/ /pubmed/22537038 http://dx.doi.org/10.1186/1471-2105-13-S6-S1 Text en Copyright ©2012 Ye et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Ye, Chengxi Ma, Zhanshan Sam Cannon, Charles H Pop, Mihai Yu, Douglas W Exploiting sparseness in de novo genome assembly |
title | Exploiting sparseness in de novo genome assembly |
title_full | Exploiting sparseness in de novo genome assembly |
title_fullStr | Exploiting sparseness in de novo genome assembly |
title_full_unstemmed | Exploiting sparseness in de novo genome assembly |
title_short | Exploiting sparseness in de novo genome assembly |
title_sort | exploiting sparseness in de novo genome assembly |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3369186/ https://www.ncbi.nlm.nih.gov/pubmed/22537038 http://dx.doi.org/10.1186/1471-2105-13-S6-S1 |
work_keys_str_mv | AT yechengxi exploitingsparsenessindenovogenomeassembly AT mazhanshansam exploitingsparsenessindenovogenomeassembly AT cannoncharlesh exploitingsparsenessindenovogenomeassembly AT popmihai exploitingsparsenessindenovogenomeassembly AT yudouglasw exploitingsparsenessindenovogenomeassembly |