Cargando…

ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies

High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation....

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Janet X., Coombe, Lauren, Wong, Johnathan, Birol, Inanç, Warren, René L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9196995/
https://www.ncbi.nlm.nih.gov/pubmed/35567771
http://dx.doi.org/10.1002/cpz1.442
_version_ 1784727304631484416
author Li, Janet X.
Coombe, Lauren
Wong, Johnathan
Birol, Inanç
Warren, René L.
author_facet Li, Janet X.
Coombe, Lauren
Wong, Johnathan
Birol, Inanç
Warren, René L.
author_sort Li, Janet X.
collection PubMed
description High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long‐read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory‐intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment‐free, k‐mer‐based genome finishing protocol that employs memory‐efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error‐corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long‐read genome finishing with short reads Support Protocol: Selecting optimal values for k‐mer lengths (k) and Bloom filter size (b)
format Online
Article
Text
id pubmed-9196995
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-91969952022-10-14 ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies Li, Janet X. Coombe, Lauren Wong, Johnathan Birol, Inanç Warren, René L. Curr Protoc Protocol High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long‐read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory‐intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment‐free, k‐mer‐based genome finishing protocol that employs memory‐efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error‐corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long‐read genome finishing with short reads Support Protocol: Selecting optimal values for k‐mer lengths (k) and Bloom filter size (b) John Wiley and Sons Inc. 2022-05-14 2022-05 /pmc/articles/PMC9196995/ /pubmed/35567771 http://dx.doi.org/10.1002/cpz1.442 Text en © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Protocol
Li, Janet X.
Coombe, Lauren
Wong, Johnathan
Birol, Inanç
Warren, René L.
ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
title ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
title_full ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
title_fullStr ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
title_full_unstemmed ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
title_short ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
title_sort ntedit+sealer: efficient targeted error resolution and automated finishing of long‐read genome assemblies
topic Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9196995/
https://www.ncbi.nlm.nih.gov/pubmed/35567771
http://dx.doi.org/10.1002/cpz1.442
work_keys_str_mv AT lijanetx nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies
AT coombelauren nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies
AT wongjohnathan nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies
AT birolinanc nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies
AT warrenrenel nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies