Cargando…

Meraculous: De Novo Genome Assembly with Short Paired-End Reads

We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no er...

Descripción completa

Detalles Bibliográficos
Autores principales: Chapman, Jarrod A., Ho, Isaac, Sunkara, Sirisha, Luo, Shujun, Schroth, Gary P., Rokhsar, Daniel S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158087/
https://www.ncbi.nlm.nih.gov/pubmed/21876754
http://dx.doi.org/10.1371/journal.pone.0023501
_version_ 1782210359192453120
author Chapman, Jarrod A.
Ho, Isaac
Sunkara, Sirisha
Luo, Shujun
Schroth, Gary P.
Rokhsar, Daniel S.
author_facet Chapman, Jarrod A.
Ho, Isaac
Sunkara, Sirisha
Luo, Shujun
Schroth, Gary P.
Rokhsar, Daniel S.
author_sort Chapman, Jarrod A.
collection PubMed
description We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.
format Online
Article
Text
id pubmed-3158087
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31580872011-08-29 Meraculous: De Novo Genome Assembly with Short Paired-End Reads Chapman, Jarrod A. Ho, Isaac Sunkara, Sirisha Luo, Shujun Schroth, Gary P. Rokhsar, Daniel S. PLoS One Research Article We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed. Public Library of Science 2011-08-18 /pmc/articles/PMC3158087/ /pubmed/21876754 http://dx.doi.org/10.1371/journal.pone.0023501 Text en This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Chapman, Jarrod A.
Ho, Isaac
Sunkara, Sirisha
Luo, Shujun
Schroth, Gary P.
Rokhsar, Daniel S.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
title Meraculous: De Novo Genome Assembly with Short Paired-End Reads
title_full Meraculous: De Novo Genome Assembly with Short Paired-End Reads
title_fullStr Meraculous: De Novo Genome Assembly with Short Paired-End Reads
title_full_unstemmed Meraculous: De Novo Genome Assembly with Short Paired-End Reads
title_short Meraculous: De Novo Genome Assembly with Short Paired-End Reads
title_sort meraculous: de novo genome assembly with short paired-end reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158087/
https://www.ncbi.nlm.nih.gov/pubmed/21876754
http://dx.doi.org/10.1371/journal.pone.0023501
work_keys_str_mv AT chapmanjarroda meraculousdenovogenomeassemblywithshortpairedendreads
AT hoisaac meraculousdenovogenomeassemblywithshortpairedendreads
AT sunkarasirisha meraculousdenovogenomeassemblywithshortpairedendreads
AT luoshujun meraculousdenovogenomeassemblywithshortpairedendreads
AT schrothgaryp meraculousdenovogenomeassemblywithshortpairedendreads
AT rokhsardaniels meraculousdenovogenomeassemblywithshortpairedendreads