Cargando…
ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs
SUMMARY: The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny b...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320612/ https://www.ncbi.nlm.nih.gov/pubmed/32311025 http://dx.doi.org/10.1093/bioinformatics/btaa253 |
_version_ | 1783551278510505984 |
---|---|
author | Coombe, Lauren Nikolić, Vladimir Chu, Justin Birol, Inanc Warren, René L |
author_facet | Coombe, Lauren Nikolić, Vladimir Chu, Justin Birol, Inanc Warren, René L |
author_sort | Coombe, Lauren |
collection | PubMed |
description | SUMMARY: The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory. AVAILABILITY AND IMPLEMENTATION: ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7320612 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73206122020-07-01 ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs Coombe, Lauren Nikolić, Vladimir Chu, Justin Birol, Inanc Warren, René L Bioinformatics Applications Notes SUMMARY: The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory. AVAILABILITY AND IMPLEMENTATION: ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-06-15 2020-04-20 /pmc/articles/PMC7320612/ /pubmed/32311025 http://dx.doi.org/10.1093/bioinformatics/btaa253 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Coombe, Lauren Nikolić, Vladimir Chu, Justin Birol, Inanc Warren, René L ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs |
title | ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs |
title_full | ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs |
title_fullStr | ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs |
title_full_unstemmed | ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs |
title_short | ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs |
title_sort | ntjoin: fast and lightweight assembly-guided scaffolding using minimizer graphs |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320612/ https://www.ncbi.nlm.nih.gov/pubmed/32311025 http://dx.doi.org/10.1093/bioinformatics/btaa253 |
work_keys_str_mv | AT coombelauren ntjoinfastandlightweightassemblyguidedscaffoldingusingminimizergraphs AT nikolicvladimir ntjoinfastandlightweightassemblyguidedscaffoldingusingminimizergraphs AT chujustin ntjoinfastandlightweightassemblyguidedscaffoldingusingminimizergraphs AT birolinanc ntjoinfastandlightweightassemblyguidedscaffoldingusingminimizergraphs AT warrenrenel ntjoinfastandlightweightassemblyguidedscaffoldingusingminimizergraphs |