Cargando…

Hybrid error correction and de novo assembly of single-molecule sequencing reads

Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Koren, Sergey, Schatz, Michael C., Walenz, Brian P., Martin, Jeffrey, Howard, Jason, Ganapathy, Ganeshkumar, Wang, Zhong, Rasko, David A., McCombie, W. Richard, Jarvis, Erich D., Phillippy, Adam M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707490/
https://www.ncbi.nlm.nih.gov/pubmed/22750884
http://dx.doi.org/10.1038/nbt.2280
_version_ 1782276512353878016
author Koren, Sergey
Schatz, Michael C.
Walenz, Brian P.
Martin, Jeffrey
Howard, Jason
Ganapathy, Ganeshkumar
Wang, Zhong
Rasko, David A.
McCombie, W. Richard
Jarvis, Erich D.
Phillippy, Adam M.
author_facet Koren, Sergey
Schatz, Michael C.
Walenz, Brian P.
Martin, Jeffrey
Howard, Jason
Ganapathy, Ganeshkumar
Wang, Zhong
Rasko, David A.
McCombie, W. Richard
Jarvis, Erich D.
Phillippy, Adam M.
author_sort Koren, Sergey
collection PubMed
description Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
format Online
Article
Text
id pubmed-3707490
institution National Center for Biotechnology Information
language English
publishDate 2012
record_format MEDLINE/PubMed
spelling pubmed-37074902013-07-10 Hybrid error correction and de novo assembly of single-molecule sequencing reads Koren, Sergey Schatz, Michael C. Walenz, Brian P. Martin, Jeffrey Howard, Jason Ganapathy, Ganeshkumar Wang, Zhong Rasko, David A. McCombie, W. Richard Jarvis, Erich D. Phillippy, Adam M. Nat Biotechnol Article Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. 2012-07-01 /pmc/articles/PMC3707490/ /pubmed/22750884 http://dx.doi.org/10.1038/nbt.2280 Text en Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Koren, Sergey
Schatz, Michael C.
Walenz, Brian P.
Martin, Jeffrey
Howard, Jason
Ganapathy, Ganeshkumar
Wang, Zhong
Rasko, David A.
McCombie, W. Richard
Jarvis, Erich D.
Phillippy, Adam M.
Hybrid error correction and de novo assembly of single-molecule sequencing reads
title Hybrid error correction and de novo assembly of single-molecule sequencing reads
title_full Hybrid error correction and de novo assembly of single-molecule sequencing reads
title_fullStr Hybrid error correction and de novo assembly of single-molecule sequencing reads
title_full_unstemmed Hybrid error correction and de novo assembly of single-molecule sequencing reads
title_short Hybrid error correction and de novo assembly of single-molecule sequencing reads
title_sort hybrid error correction and de novo assembly of single-molecule sequencing reads
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707490/
https://www.ncbi.nlm.nih.gov/pubmed/22750884
http://dx.doi.org/10.1038/nbt.2280
work_keys_str_mv AT korensergey hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT schatzmichaelc hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT walenzbrianp hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT martinjeffrey hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT howardjason hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT ganapathyganeshkumar hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT wangzhong hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT raskodavida hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT mccombiewrichard hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT jarviserichd hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads
AT phillippyadamm hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads