Cargando…

LRScaf: improving draft genomes using long noisy reads

BACKGROUND: The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are cap...

Descripción completa

Detalles Bibliográficos
Autores principales: Qin, Mao, Wu, Shigang, Li, Alun, Zhao, Fengli, Feng, Hu, Ding, Lulu, Ruan, Jue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902338/
https://www.ncbi.nlm.nih.gov/pubmed/31818249
http://dx.doi.org/10.1186/s12864-019-6337-2
_version_ 1783477646887223296
author Qin, Mao
Wu, Shigang
Li, Alun
Zhao, Fengli
Feng, Hu
Ding, Lulu
Ruan, Jue
author_facet Qin, Mao
Wu, Shigang
Li, Alun
Zhao, Fengli
Feng, Hu
Ding, Lulu
Ruan, Jue
author_sort Qin, Mao
collection PubMed
description BACKGROUND: The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. RESULTS: We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). CONCLUSIONS: The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.
format Online
Article
Text
id pubmed-6902338
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69023382019-12-11 LRScaf: improving draft genomes using long noisy reads Qin, Mao Wu, Shigang Li, Alun Zhao, Fengli Feng, Hu Ding, Lulu Ruan, Jue BMC Genomics Software BACKGROUND: The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. RESULTS: We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). CONCLUSIONS: The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes. BioMed Central 2019-12-09 /pmc/articles/PMC6902338/ /pubmed/31818249 http://dx.doi.org/10.1186/s12864-019-6337-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Qin, Mao
Wu, Shigang
Li, Alun
Zhao, Fengli
Feng, Hu
Ding, Lulu
Ruan, Jue
LRScaf: improving draft genomes using long noisy reads
title LRScaf: improving draft genomes using long noisy reads
title_full LRScaf: improving draft genomes using long noisy reads
title_fullStr LRScaf: improving draft genomes using long noisy reads
title_full_unstemmed LRScaf: improving draft genomes using long noisy reads
title_short LRScaf: improving draft genomes using long noisy reads
title_sort lrscaf: improving draft genomes using long noisy reads
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902338/
https://www.ncbi.nlm.nih.gov/pubmed/31818249
http://dx.doi.org/10.1186/s12864-019-6337-2
work_keys_str_mv AT qinmao lrscafimprovingdraftgenomesusinglongnoisyreads
AT wushigang lrscafimprovingdraftgenomesusinglongnoisyreads
AT lialun lrscafimprovingdraftgenomesusinglongnoisyreads
AT zhaofengli lrscafimprovingdraftgenomesusinglongnoisyreads
AT fenghu lrscafimprovingdraftgenomesusinglongnoisyreads
AT dinglulu lrscafimprovingdraftgenomesusinglongnoisyreads
AT ruanjue lrscafimprovingdraftgenomesusinglongnoisyreads