Cargando…

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers

Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality compl...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahadik, Kanak, Wright, Christopher, Kulkarni, Milind, Bagchi, Saurabh, Chaterji, Somali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6795807/
https://www.ncbi.nlm.nih.gov/pubmed/31619717
http://dx.doi.org/10.1038/s41598-019-51284-9
_version_ 1783459513885523968
author Mahadik, Kanak
Wright, Christopher
Kulkarni, Milind
Bagchi, Saurabh
Chaterji, Somali
author_facet Mahadik, Kanak
Wright, Christopher
Kulkarni, Milind
Bagchi, Saurabh
Chaterji, Somali
author_sort Mahadik, Kanak
collection PubMed
description Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to “patch” a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).
format Online
Article
Text
id pubmed-6795807
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-67958072019-10-25 Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers Mahadik, Kanak Wright, Christopher Kulkarni, Milind Bagchi, Saurabh Chaterji, Somali Sci Rep Article Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to “patch” a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset). Nature Publishing Group UK 2019-10-16 /pmc/articles/PMC6795807/ /pubmed/31619717 http://dx.doi.org/10.1038/s41598-019-51284-9 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Mahadik, Kanak
Wright, Christopher
Kulkarni, Milind
Bagchi, Saurabh
Chaterji, Somali
Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
title Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
title_full Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
title_fullStr Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
title_full_unstemmed Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
title_short Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
title_sort scalable genome assembly through parallel de bruijn graph construction for multiple k-mers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6795807/
https://www.ncbi.nlm.nih.gov/pubmed/31619717
http://dx.doi.org/10.1038/s41598-019-51284-9
work_keys_str_mv AT mahadikkanak scalablegenomeassemblythroughparalleldebruijngraphconstructionformultiplekmers
AT wrightchristopher scalablegenomeassemblythroughparalleldebruijngraphconstructionformultiplekmers
AT kulkarnimilind scalablegenomeassemblythroughparalleldebruijngraphconstructionformultiplekmers
AT bagchisaurabh scalablegenomeassemblythroughparalleldebruijngraphconstructionformultiplekmers
AT chaterjisomali scalablegenomeassemblythroughparalleldebruijngraphconstructionformultiplekmers