Cargando…

Combining de novo and reference-guided assembly with scaffold_builder

Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Genivaldo GZ, Dutilh, Bas E, Matthews, T David, Elkins, Keri, Schmieder, Robert, Dinsdale, Elizabeth A, Edwards, Robert A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177539/
https://www.ncbi.nlm.nih.gov/pubmed/24267787
http://dx.doi.org/10.1186/1751-0473-8-23
_version_ 1782336778250747904
author Silva, Genivaldo GZ
Dutilh, Bas E
Matthews, T David
Elkins, Keri
Schmieder, Robert
Dinsdale, Elizabeth A
Edwards, Robert A
author_facet Silva, Genivaldo GZ
Dutilh, Bas E
Matthews, T David
Elkins, Keri
Schmieder, Robert
Dinsdale, Elizabeth A
Edwards, Robert A
author_sort Silva, Genivaldo GZ
collection PubMed
description Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.
format Online
Article
Text
id pubmed-4177539
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41775392014-09-29 Combining de novo and reference-guided assembly with scaffold_builder Silva, Genivaldo GZ Dutilh, Bas E Matthews, T David Elkins, Keri Schmieder, Robert Dinsdale, Elizabeth A Edwards, Robert A Source Code Biol Med Software Review Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded. BioMed Central 2013-11-22 /pmc/articles/PMC4177539/ /pubmed/24267787 http://dx.doi.org/10.1186/1751-0473-8-23 Text en Copyright © 2013 Silva et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Review
Silva, Genivaldo GZ
Dutilh, Bas E
Matthews, T David
Elkins, Keri
Schmieder, Robert
Dinsdale, Elizabeth A
Edwards, Robert A
Combining de novo and reference-guided assembly with scaffold_builder
title Combining de novo and reference-guided assembly with scaffold_builder
title_full Combining de novo and reference-guided assembly with scaffold_builder
title_fullStr Combining de novo and reference-guided assembly with scaffold_builder
title_full_unstemmed Combining de novo and reference-guided assembly with scaffold_builder
title_short Combining de novo and reference-guided assembly with scaffold_builder
title_sort combining de novo and reference-guided assembly with scaffold_builder
topic Software Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177539/
https://www.ncbi.nlm.nih.gov/pubmed/24267787
http://dx.doi.org/10.1186/1751-0473-8-23
work_keys_str_mv AT silvagenivaldogz combiningdenovoandreferenceguidedassemblywithscaffoldbuilder
AT dutilhbase combiningdenovoandreferenceguidedassemblywithscaffoldbuilder
AT matthewstdavid combiningdenovoandreferenceguidedassemblywithscaffoldbuilder
AT elkinskeri combiningdenovoandreferenceguidedassemblywithscaffoldbuilder
AT schmiederrobert combiningdenovoandreferenceguidedassemblywithscaffoldbuilder
AT dinsdaleelizabetha combiningdenovoandreferenceguidedassemblywithscaffoldbuilder
AT edwardsroberta combiningdenovoandreferenceguidedassemblywithscaffoldbuilder