Cargando…

Crystallizing short-read assemblies around seeds

BACKGROUND: New short-read sequencing technologies produce enormous volumes of 25–30 base paired-end reads. The resulting reads have vastly different characteristics than produced by Sanger sequencing, and require different approaches than the previous generation of sequence assemblers. In this pape...

Descripción completa

Detalles Bibliográficos
Autores principales: Hossain, Mohammad Sajjad, Azimi, Navid, Skiena, Steven
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648751/
https://www.ncbi.nlm.nih.gov/pubmed/19208115
http://dx.doi.org/10.1186/1471-2105-10-S1-S16
_version_ 1782164979648036864
author Hossain, Mohammad Sajjad
Azimi, Navid
Skiena, Steven
author_facet Hossain, Mohammad Sajjad
Azimi, Navid
Skiena, Steven
author_sort Hossain, Mohammad Sajjad
collection PubMed
description BACKGROUND: New short-read sequencing technologies produce enormous volumes of 25–30 base paired-end reads. The resulting reads have vastly different characteristics than produced by Sanger sequencing, and require different approaches than the previous generation of sequence assemblers. In this paper, we present a short-read de novo assembler particularly targeted at the new ABI SOLiD sequencing technology. RESULTS: This paper presents what we believe to be the first de novo sequence assembly results on real data from the emerging SOLiD platform, introduced by Applied Biosystems. Our assembler SHORTY augments short-paired reads using a trivially small number (5 – 10) of seeds of length 300 – 500 bp. These seeds enable us to produce significant assemblies using short-read coverage no more than 100×, which can be obtained in a single run of these high-capacity sequencers. SHORTY exploits two ideas which we believe to be of interest to the short-read assembly community: (1) using single seed reads to crystallize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. CONCLUSION: We demonstrate effective assemblies (N50 contig sizes ~40 kb) of three different bacterial species using simulated SOLiD data. Sequencing artifacts limit our performance on real data, however our results on this data are substantially better than those achieved by competing assemblers.
format Text
id pubmed-2648751
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26487512009-02-28 Crystallizing short-read assemblies around seeds Hossain, Mohammad Sajjad Azimi, Navid Skiena, Steven BMC Bioinformatics Research BACKGROUND: New short-read sequencing technologies produce enormous volumes of 25–30 base paired-end reads. The resulting reads have vastly different characteristics than produced by Sanger sequencing, and require different approaches than the previous generation of sequence assemblers. In this paper, we present a short-read de novo assembler particularly targeted at the new ABI SOLiD sequencing technology. RESULTS: This paper presents what we believe to be the first de novo sequence assembly results on real data from the emerging SOLiD platform, introduced by Applied Biosystems. Our assembler SHORTY augments short-paired reads using a trivially small number (5 – 10) of seeds of length 300 – 500 bp. These seeds enable us to produce significant assemblies using short-read coverage no more than 100×, which can be obtained in a single run of these high-capacity sequencers. SHORTY exploits two ideas which we believe to be of interest to the short-read assembly community: (1) using single seed reads to crystallize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. CONCLUSION: We demonstrate effective assemblies (N50 contig sizes ~40 kb) of three different bacterial species using simulated SOLiD data. Sequencing artifacts limit our performance on real data, however our results on this data are substantially better than those achieved by competing assemblers. BioMed Central 2009-01-30 /pmc/articles/PMC2648751/ /pubmed/19208115 http://dx.doi.org/10.1186/1471-2105-10-S1-S16 Text en Copyright © 2009 Hossain et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hossain, Mohammad Sajjad
Azimi, Navid
Skiena, Steven
Crystallizing short-read assemblies around seeds
title Crystallizing short-read assemblies around seeds
title_full Crystallizing short-read assemblies around seeds
title_fullStr Crystallizing short-read assemblies around seeds
title_full_unstemmed Crystallizing short-read assemblies around seeds
title_short Crystallizing short-read assemblies around seeds
title_sort crystallizing short-read assemblies around seeds
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648751/
https://www.ncbi.nlm.nih.gov/pubmed/19208115
http://dx.doi.org/10.1186/1471-2105-10-S1-S16
work_keys_str_mv AT hossainmohammadsajjad crystallizingshortreadassembliesaroundseeds
AT aziminavid crystallizingshortreadassembliesaroundseeds
AT skienasteven crystallizingshortreadassembliesaroundseeds