Cargando…

WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data

BACKGROUND: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results...

Descripción completa

Detalles Bibliográficos
Autores principales: Farrant, Gregory K., Hoebeke, Mark, Partensky, Frédéric, Andres, Gwendoline, Corre, Erwan, Garczarek, Laurence
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559175/
https://www.ncbi.nlm.nih.gov/pubmed/26335184
http://dx.doi.org/10.1186/s12859-015-0705-y
_version_ 1782388735272288256
author Farrant, Gregory K.
Hoebeke, Mark
Partensky, Frédéric
Andres, Gwendoline
Corre, Erwan
Garczarek, Laurence
author_facet Farrant, Gregory K.
Hoebeke, Mark
Partensky, Frédéric
Andres, Gwendoline
Corre, Erwan
Garczarek, Laurence
author_sort Farrant, Gregory K.
collection PubMed
description BACKGROUND: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding. RESULTS: Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome. CONCLUSION: Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4559175
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45591752015-09-04 WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data Farrant, Gregory K. Hoebeke, Mark Partensky, Frédéric Andres, Gwendoline Corre, Erwan Garczarek, Laurence BMC Bioinformatics Methodology Article BACKGROUND: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding. RESULTS: Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome. CONCLUSION: Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-03 /pmc/articles/PMC4559175/ /pubmed/26335184 http://dx.doi.org/10.1186/s12859-015-0705-y Text en © Farrant et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Farrant, Gregory K.
Hoebeke, Mark
Partensky, Frédéric
Andres, Gwendoline
Corre, Erwan
Garczarek, Laurence
WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
title WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
title_full WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
title_fullStr WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
title_full_unstemmed WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
title_short WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
title_sort wisescaffolder: an algorithm for the semi-automatic scaffolding of next generation sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559175/
https://www.ncbi.nlm.nih.gov/pubmed/26335184
http://dx.doi.org/10.1186/s12859-015-0705-y
work_keys_str_mv AT farrantgregoryk wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata
AT hoebekemark wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata
AT partenskyfrederic wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata
AT andresgwendoline wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata
AT correerwan wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata
AT garczareklaurence wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata