Cargando…
Hybrid error correction and de novo assembly of single-molecule sequencing reads
Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address thi...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707490/ https://www.ncbi.nlm.nih.gov/pubmed/22750884 http://dx.doi.org/10.1038/nbt.2280 |
_version_ | 1782276512353878016 |
---|---|
author | Koren, Sergey Schatz, Michael C. Walenz, Brian P. Martin, Jeffrey Howard, Jason Ganapathy, Ganeshkumar Wang, Zhong Rasko, David A. McCombie, W. Richard Jarvis, Erich D. Phillippy, Adam M. |
author_facet | Koren, Sergey Schatz, Michael C. Walenz, Brian P. Martin, Jeffrey Howard, Jason Ganapathy, Ganeshkumar Wang, Zhong Rasko, David A. McCombie, W. Richard Jarvis, Erich D. Phillippy, Adam M. |
author_sort | Koren, Sergey |
collection | PubMed |
description | Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. |
format | Online Article Text |
id | pubmed-3707490 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
record_format | MEDLINE/PubMed |
spelling | pubmed-37074902013-07-10 Hybrid error correction and de novo assembly of single-molecule sequencing reads Koren, Sergey Schatz, Michael C. Walenz, Brian P. Martin, Jeffrey Howard, Jason Ganapathy, Ganeshkumar Wang, Zhong Rasko, David A. McCombie, W. Richard Jarvis, Erich D. Phillippy, Adam M. Nat Biotechnol Article Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. 2012-07-01 /pmc/articles/PMC3707490/ /pubmed/22750884 http://dx.doi.org/10.1038/nbt.2280 Text en Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Koren, Sergey Schatz, Michael C. Walenz, Brian P. Martin, Jeffrey Howard, Jason Ganapathy, Ganeshkumar Wang, Zhong Rasko, David A. McCombie, W. Richard Jarvis, Erich D. Phillippy, Adam M. Hybrid error correction and de novo assembly of single-molecule sequencing reads |
title | Hybrid error correction and de novo assembly of single-molecule sequencing reads |
title_full | Hybrid error correction and de novo assembly of single-molecule sequencing reads |
title_fullStr | Hybrid error correction and de novo assembly of single-molecule sequencing reads |
title_full_unstemmed | Hybrid error correction and de novo assembly of single-molecule sequencing reads |
title_short | Hybrid error correction and de novo assembly of single-molecule sequencing reads |
title_sort | hybrid error correction and de novo assembly of single-molecule sequencing reads |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707490/ https://www.ncbi.nlm.nih.gov/pubmed/22750884 http://dx.doi.org/10.1038/nbt.2280 |
work_keys_str_mv | AT korensergey hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT schatzmichaelc hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT walenzbrianp hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT martinjeffrey hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT howardjason hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT ganapathyganeshkumar hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT wangzhong hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT raskodavida hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT mccombiewrichard hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT jarviserichd hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads AT phillippyadamm hybriderrorcorrectionanddenovoassemblyofsinglemoleculesequencingreads |