Cargando…

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly

BACKGROUND: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity....

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Gui-Cai, Xu, Tian-Jun, Zhu, Rui, Zhang, Yan, Li, Shang-Qi, Wang, Hong-Wei, Li, Jiong-Tang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324547/
https://www.ncbi.nlm.nih.gov/pubmed/30576505
http://dx.doi.org/10.1093/gigascience/giy157
_version_ 1783385995819876352
author Xu, Gui-Cai
Xu, Tian-Jun
Zhu, Rui
Zhang, Yan
Li, Shang-Qi
Wang, Hong-Wei
Li, Jiong-Tang
author_facet Xu, Gui-Cai
Xu, Tian-Jun
Zhu, Rui
Zhang, Yan
Li, Shang-Qi
Wang, Hong-Wei
Li, Jiong-Tang
author_sort Xu, Gui-Cai
collection PubMed
description BACKGROUND: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS: We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS: LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
format Online
Article
Text
id pubmed-6324547
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63245472019-01-10 LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly Xu, Gui-Cai Xu, Tian-Jun Zhu, Rui Zhang, Yan Li, Shang-Qi Wang, Hong-Wei Li, Jiong-Tang Gigascience Technical Note BACKGROUND: Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS: We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS: LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/. Oxford University Press 2018-12-21 /pmc/articles/PMC6324547/ /pubmed/30576505 http://dx.doi.org/10.1093/gigascience/giy157 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Xu, Gui-Cai
Xu, Tian-Jun
Zhu, Rui
Zhang, Yan
Li, Shang-Qi
Wang, Hong-Wei
Li, Jiong-Tang
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
title LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
title_full LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
title_fullStr LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
title_full_unstemmed LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
title_short LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
title_sort lr_gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324547/
https://www.ncbi.nlm.nih.gov/pubmed/30576505
http://dx.doi.org/10.1093/gigascience/giy157
work_keys_str_mv AT xuguicai lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly
AT xutianjun lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly
AT zhurui lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly
AT zhangyan lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly
AT lishangqi lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly
AT wanghongwei lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly
AT lijiongtang lrgapcloseratilingpathbasedgapcloserthatuseslongreadstocompletegenomeassembly