Cargando…

Sealer: a scalable gap-closing application for finishing draft genomes

BACKGROUND: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and shor...

Descripción completa

Detalles Bibliográficos
Autores principales: Paulino, Daniel, Warren, René L., Vandervalk, Benjamin P., Raymond, Anthony, Jackman, Shaun D., Birol, Inanç
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4515008/
https://www.ncbi.nlm.nih.gov/pubmed/26209068
http://dx.doi.org/10.1186/s12859-015-0663-4
_version_ 1782382857314893824
author Paulino, Daniel
Warren, René L.
Vandervalk, Benjamin P.
Raymond, Anthony
Jackman, Shaun D.
Birol, Inanç
author_facet Paulino, Daniel
Warren, René L.
Vandervalk, Benjamin P.
Raymond, Anthony
Jackman, Shaun D.
Birol, Inanç
author_sort Paulino, Daniel
collection PubMed
description BACKGROUND: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes. RESULTS: Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study. CONCLUSION: Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0663-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4515008
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45150082015-07-26 Sealer: a scalable gap-closing application for finishing draft genomes Paulino, Daniel Warren, René L. Vandervalk, Benjamin P. Raymond, Anthony Jackman, Shaun D. Birol, Inanç BMC Bioinformatics Software BACKGROUND: While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads. Even though there are several tools for closing gaps, they do not easily scale up to processing billion base pair genomes. RESULTS: Here we describe Sealer, a tool designed to close gaps within assembly scaffolds by navigating de Bruijn graphs represented by space-efficient Bloom filter data structures. We demonstrate how it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h, respectively – a feat that is not possible with other leading tools with the breadth of data used in our study. CONCLUSION: Sealer is an automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. We expect Sealer to have broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond. Sealer is available for download at https://github.com/bcgsc/abyss/tree/sealer-release. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0663-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-25 /pmc/articles/PMC4515008/ /pubmed/26209068 http://dx.doi.org/10.1186/s12859-015-0663-4 Text en © Paulino et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Paulino, Daniel
Warren, René L.
Vandervalk, Benjamin P.
Raymond, Anthony
Jackman, Shaun D.
Birol, Inanç
Sealer: a scalable gap-closing application for finishing draft genomes
title Sealer: a scalable gap-closing application for finishing draft genomes
title_full Sealer: a scalable gap-closing application for finishing draft genomes
title_fullStr Sealer: a scalable gap-closing application for finishing draft genomes
title_full_unstemmed Sealer: a scalable gap-closing application for finishing draft genomes
title_short Sealer: a scalable gap-closing application for finishing draft genomes
title_sort sealer: a scalable gap-closing application for finishing draft genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4515008/
https://www.ncbi.nlm.nih.gov/pubmed/26209068
http://dx.doi.org/10.1186/s12859-015-0663-4
work_keys_str_mv AT paulinodaniel sealerascalablegapclosingapplicationforfinishingdraftgenomes
AT warrenrenel sealerascalablegapclosingapplicationforfinishingdraftgenomes
AT vandervalkbenjaminp sealerascalablegapclosingapplicationforfinishingdraftgenomes
AT raymondanthony sealerascalablegapclosingapplicationforfinishingdraftgenomes
AT jackmanshaund sealerascalablegapclosingapplicationforfinishingdraftgenomes
AT birolinanc sealerascalablegapclosingapplicationforfinishingdraftgenomes