Cargando…

Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHo...

Descripción completa

Detalles Bibliográficos
Autores principales: Sundquist, Andreas, Ronaghi, Mostafa, Tang, Haixu, Pevzner, Pavel, Batzoglou, Serafim
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1871613/
https://www.ncbi.nlm.nih.gov/pubmed/17534434
http://dx.doi.org/10.1371/journal.pone.0000484
_version_ 1782133447464058880
author Sundquist, Andreas
Ronaghi, Mostafa
Tang, Haixu
Pevzner, Pavel
Batzoglou, Serafim
author_facet Sundquist, Andreas
Ronaghi, Mostafa
Tang, Haixu
Pevzner, Pavel
Batzoglou, Serafim
author_sort Sundquist, Andreas
collection PubMed
description While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.
format Text
id pubmed-1871613
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18716132007-05-30 Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies Sundquist, Andreas Ronaghi, Mostafa Tang, Haixu Pevzner, Pavel Batzoglou, Serafim PLoS One Research Article While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. Public Library of Science 2007-05-30 /pmc/articles/PMC1871613/ /pubmed/17534434 http://dx.doi.org/10.1371/journal.pone.0000484 Text en Sundquist et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Sundquist, Andreas
Ronaghi, Mostafa
Tang, Haixu
Pevzner, Pavel
Batzoglou, Serafim
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
title Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
title_full Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
title_fullStr Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
title_full_unstemmed Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
title_short Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
title_sort whole-genome sequencing and assembly with high-throughput, short-read technologies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1871613/
https://www.ncbi.nlm.nih.gov/pubmed/17534434
http://dx.doi.org/10.1371/journal.pone.0000484
work_keys_str_mv AT sundquistandreas wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT ronaghimostafa wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT tanghaixu wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT pevznerpavel wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies
AT batzoglouserafim wholegenomesequencingandassemblywithhighthroughputshortreadtechnologies