Cargando…

Assembling millions of short DNA sequences using SSAKE

Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in...

Descripción completa

Detalles Bibliográficos
Autores principales: Warren, René L., Sutton, Granger G., Jones, Steven J. M., Holt, Robert A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7109930/
https://www.ncbi.nlm.nih.gov/pubmed/17158514
http://dx.doi.org/10.1093/bioinformatics/btl629
_version_ 1783513004586827776
author Warren, René L.
Sutton, Granger G.
Jones, Steven J. M.
Holt, Robert A.
author_facet Warren, René L.
Sutton, Granger G.
Jones, Steven J. M.
Holt, Robert A.
author_sort Warren, René L.
collection PubMed
description Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: Contact: rwarren@bcgsc.ca
format Online
Article
Text
id pubmed-7109930
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-71099302020-04-02 Assembling millions of short DNA sequences using SSAKE Warren, René L. Sutton, Granger G. Jones, Steven J. M. Holt, Robert A. Bioinformatics Applications Notes Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: Contact: rwarren@bcgsc.ca Oxford University Press 2007-02-15 2006-12-08 /pmc/articles/PMC7109930/ /pubmed/17158514 http://dx.doi.org/10.1093/bioinformatics/btl629 Text en © 2006 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article is made available via the PMC Open Access Subset for unrestricted re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the COVID-19 pandemic or until permissions are revoked in writing. Upon expiration of these permissions, PMC is granted a perpetual license to make this article available via PMC and Europe PMC, consistent with existing copyright protections.
spellingShingle Applications Notes
Warren, René L.
Sutton, Granger G.
Jones, Steven J. M.
Holt, Robert A.
Assembling millions of short DNA sequences using SSAKE
title Assembling millions of short DNA sequences using SSAKE
title_full Assembling millions of short DNA sequences using SSAKE
title_fullStr Assembling millions of short DNA sequences using SSAKE
title_full_unstemmed Assembling millions of short DNA sequences using SSAKE
title_short Assembling millions of short DNA sequences using SSAKE
title_sort assembling millions of short dna sequences using ssake
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7109930/
https://www.ncbi.nlm.nih.gov/pubmed/17158514
http://dx.doi.org/10.1093/bioinformatics/btl629
work_keys_str_mv AT warrenrenel assemblingmillionsofshortdnasequencesusingssake
AT suttongrangerg assemblingmillionsofshortdnasequencesusingssake
AT jonesstevenjm assemblingmillionsofshortdnasequencesusingssake
AT holtroberta assemblingmillionsofshortdnasequencesusingssake