Cargando…

deBGR: an efficient and near-exact representation of the weighted de Bruijn graph

MOTIVATION: Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using ‘long read’ technologies like those offere...

Descripción completa

Detalles Bibliográficos
Autores principales: Pandey, Prashant, Bender, Michael A, Johnson, Rob, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870571/
https://www.ncbi.nlm.nih.gov/pubmed/28881995
http://dx.doi.org/10.1093/bioinformatics/btx261
_version_ 1783309512034222080
author Pandey, Prashant
Bender, Michael A
Johnson, Rob
Patro, Rob
author_facet Pandey, Prashant
Bender, Michael A
Johnson, Rob
Patro, Rob
author_sort Pandey, Prashant
collection PubMed
description MOTIVATION: Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using ‘long read’ technologies like those offered by PacBio or Oxford Nanopore), efficient k-mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k-mer occurs, which is key in transcriptome assemblers. RESULTS: We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18–28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. AVAILABILITY AND IMPLEMENTATION: https://github.com/splatlab/debgr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870571
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58705712018-04-05 deBGR: an efficient and near-exact representation of the weighted de Bruijn graph Pandey, Prashant Bender, Michael A Johnson, Rob Patro, Rob Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using ‘long read’ technologies like those offered by PacBio or Oxford Nanopore), efficient k-mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k-mer occurs, which is key in transcriptome assemblers. RESULTS: We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18–28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. AVAILABILITY AND IMPLEMENTATION: https://github.com/splatlab/debgr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870571/ /pubmed/28881995 http://dx.doi.org/10.1093/bioinformatics/btx261 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Pandey, Prashant
Bender, Michael A
Johnson, Rob
Patro, Rob
deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
title deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
title_full deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
title_fullStr deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
title_full_unstemmed deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
title_short deBGR: an efficient and near-exact representation of the weighted de Bruijn graph
title_sort debgr: an efficient and near-exact representation of the weighted de bruijn graph
topic Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870571/
https://www.ncbi.nlm.nih.gov/pubmed/28881995
http://dx.doi.org/10.1093/bioinformatics/btx261
work_keys_str_mv AT pandeyprashant debgranefficientandnearexactrepresentationoftheweighteddebruijngraph
AT bendermichaela debgranefficientandnearexactrepresentationoftheweighteddebruijngraph
AT johnsonrob debgranefficientandnearexactrepresentationoftheweighteddebruijngraph
AT patrorob debgranefficientandnearexactrepresentationoftheweighteddebruijngraph