Cargando…
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
BACKGROUND: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity....
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324547/ https://www.ncbi.nlm.nih.gov/pubmed/30576505 http://dx.doi.org/10.1093/gigascience/giy157 |
_version_ | 1783385995819876352 |
---|---|
author | Xu, Gui-Cai Xu, Tian-Jun Zhu, Rui Zhang, Yan Li, Shang-Qi Wang, Hong-Wei Li, Jiong-Tang |
author_facet | Xu, Gui-Cai Xu, Tian-Jun Zhu, Rui Zhang, Yan Li, Shang-Qi Wang, Hong-Wei Li, Jiong-Tang |
author_sort | Xu, Gui-Cai |
collection | PubMed |
description | BACKGROUND: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS: We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS: LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/. |
format | Online Article Text |
id | pubmed-6324547 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63245472019-01-10 LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly Xu, Gui-Cai Xu, Tian-Jun Zhu, Rui Zhang, Yan Li, Shang-Qi Wang, Hong-Wei Li, Jiong-Tang Gigascience Technical Note BACKGROUND: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS: We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS: LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/. Oxford University Press 2018-12-21 /pmc/articles/PMC6324547/ /pubmed/30576505 http://dx.doi.org/10.1093/gigascience/giy157 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Xu, Gui-Cai Xu, Tian-Jun Zhu, Rui Zhang, Yan Li, Shang-Qi Wang, Hong-Wei Li, Jiong-Tang LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
title | LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
title_full | LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
title_fullStr | LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
title_full_unstemmed | LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
title_short | LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
title_sort | lr_gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324547/ https://www.ncbi.nlm.nih.gov/pubmed/30576505 http://dx.doi.org/10.1093/gigascience/giy157 |
work_keys_str_mv | AT xuguicai lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly AT xutianjun lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly AT zhurui lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly AT zhangyan lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly AT lishangqi lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly AT wanghongwei lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly AT lijiongtang lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly |