Cargando…
WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data
BACKGROUND: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559175/ https://www.ncbi.nlm.nih.gov/pubmed/26335184 http://dx.doi.org/10.1186/s12859-015-0705-y |
_version_ | 1782388735272288256 |
---|---|
author | Farrant, Gregory K. Hoebeke, Mark Partensky, Frédéric Andres, Gwendoline Corre, Erwan Garczarek, Laurence |
author_facet | Farrant, Gregory K. Hoebeke, Mark Partensky, Frédéric Andres, Gwendoline Corre, Erwan Garczarek, Laurence |
author_sort | Farrant, Gregory K. |
collection | PubMed |
description | BACKGROUND: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding. RESULTS: Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome. CONCLUSION: Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4559175 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45591752015-09-04 WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data Farrant, Gregory K. Hoebeke, Mark Partensky, Frédéric Andres, Gwendoline Corre, Erwan Garczarek, Laurence BMC Bioinformatics Methodology Article BACKGROUND: The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding. RESULTS: Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome. CONCLUSION: Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-03 /pmc/articles/PMC4559175/ /pubmed/26335184 http://dx.doi.org/10.1186/s12859-015-0705-y Text en © Farrant et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Farrant, Gregory K. Hoebeke, Mark Partensky, Frédéric Andres, Gwendoline Corre, Erwan Garczarek, Laurence WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data |
title | WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data |
title_full | WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data |
title_fullStr | WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data |
title_full_unstemmed | WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data |
title_short | WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data |
title_sort | wisescaffolder: an algorithm for the semi-automatic scaffolding of next generation sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559175/ https://www.ncbi.nlm.nih.gov/pubmed/26335184 http://dx.doi.org/10.1186/s12859-015-0705-y |
work_keys_str_mv | AT farrantgregoryk wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata AT hoebekemark wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata AT partenskyfrederic wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata AT andresgwendoline wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata AT correerwan wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata AT garczareklaurence wisescaffolderanalgorithmforthesemiautomaticscaffoldingofnextgenerationsequencingdata |