Cargando…
ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies
High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation....
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9196995/ https://www.ncbi.nlm.nih.gov/pubmed/35567771 http://dx.doi.org/10.1002/cpz1.442 |
_version_ | 1784727304631484416 |
---|---|
author | Li, Janet X. Coombe, Lauren Wong, Johnathan Birol, Inanç Warren, René L. |
author_facet | Li, Janet X. Coombe, Lauren Wong, Johnathan Birol, Inanç Warren, René L. |
author_sort | Li, Janet X. |
collection | PubMed |
description | High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long‐read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory‐intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment‐free, k‐mer‐based genome finishing protocol that employs memory‐efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error‐corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long‐read genome finishing with short reads Support Protocol: Selecting optimal values for k‐mer lengths (k) and Bloom filter size (b) |
format | Online Article Text |
id | pubmed-9196995 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-91969952022-10-14 ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies Li, Janet X. Coombe, Lauren Wong, Johnathan Birol, Inanç Warren, René L. Curr Protoc Protocol High‐quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long‐read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory‐intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment‐free, k‐mer‐based genome finishing protocol that employs memory‐efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error‐corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long‐read genome finishing with short reads Support Protocol: Selecting optimal values for k‐mer lengths (k) and Bloom filter size (b) John Wiley and Sons Inc. 2022-05-14 2022-05 /pmc/articles/PMC9196995/ /pubmed/35567771 http://dx.doi.org/10.1002/cpz1.442 Text en © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Protocol Li, Janet X. Coombe, Lauren Wong, Johnathan Birol, Inanç Warren, René L. ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies |
title | ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies |
title_full | ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies |
title_fullStr | ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies |
title_full_unstemmed | ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies |
title_short | ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long‐Read Genome Assemblies |
title_sort | ntedit+sealer: efficient targeted error resolution and automated finishing of long‐read genome assemblies |
topic | Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9196995/ https://www.ncbi.nlm.nih.gov/pubmed/35567771 http://dx.doi.org/10.1002/cpz1.442 |
work_keys_str_mv | AT lijanetx nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies AT coombelauren nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies AT wongjohnathan nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies AT birolinanc nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies AT warrenrenel nteditsealerefficienttargetederrorresolutionandautomatedfinishingoflongreadgenomeassemblies |