Cargando…

SLR: a scaffolding algorithm based on long reads and contig classification

BACKGROUND: Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Junwei, Lyu, Mengna, Chen, Ranran, Zhang, Xiaohong, Luo, Huimin, Yan, Chaokun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6820941/
https://www.ncbi.nlm.nih.gov/pubmed/31666010
http://dx.doi.org/10.1186/s12859-019-3114-9
_version_ 1783464049649909760
author Luo, Junwei
Lyu, Mengna
Chen, Ranran
Zhang, Xiaohong
Luo, Huimin
Yan, Chaokun
author_facet Luo, Junwei
Lyu, Mengna
Chen, Ranran
Zhang, Xiaohong
Luo, Huimin
Yan, Chaokun
author_sort Luo, Junwei
collection PubMed
description BACKGROUND: Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the past few years, long reads sequenced by third-generation sequencing technologies (Pacific Biosciences and Oxford Nanopore) have been demonstrated to be useful for sequencing repetitive regions in genomes. Although some stand-alone scaffolding algorithms based on long reads have been presented, scaffolding still requires a new strategy to take full advantage of the characteristics of long reads. RESULTS: Here, we present a new scaffolding algorithm based on long reads and contig classification (SLR). Through the alignment information of long reads and contigs, SLR classifies the contigs into unique contigs and ambiguous contigs for addressing the problem of repetitive regions. Next, SLR uses only unique contigs to produce draft scaffolds. Then, SLR inserts the ambiguous contigs into the draft scaffolds and produces the final scaffolds. We compare SLR to three popular scaffolding tools by using long read datasets sequenced with Pacific Biosciences and Oxford Nanopore technologies. The experimental results show that SLR can produce better results in terms of accuracy and completeness. The open-source code of SLR is available at https://github.com/luojunwei/SLR. CONCLUSION: In this paper, we describes SLR, which is designed to scaffold contigs using long reads. We conclude that SLR can improve the completeness of genome assembly.
format Online
Article
Text
id pubmed-6820941
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68209412019-11-04 SLR: a scaffolding algorithm based on long reads and contig classification Luo, Junwei Lyu, Mengna Chen, Ranran Zhang, Xiaohong Luo, Huimin Yan, Chaokun BMC Bioinformatics Methodology Article BACKGROUND: Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the past few years, long reads sequenced by third-generation sequencing technologies (Pacific Biosciences and Oxford Nanopore) have been demonstrated to be useful for sequencing repetitive regions in genomes. Although some stand-alone scaffolding algorithms based on long reads have been presented, scaffolding still requires a new strategy to take full advantage of the characteristics of long reads. RESULTS: Here, we present a new scaffolding algorithm based on long reads and contig classification (SLR). Through the alignment information of long reads and contigs, SLR classifies the contigs into unique contigs and ambiguous contigs for addressing the problem of repetitive regions. Next, SLR uses only unique contigs to produce draft scaffolds. Then, SLR inserts the ambiguous contigs into the draft scaffolds and produces the final scaffolds. We compare SLR to three popular scaffolding tools by using long read datasets sequenced with Pacific Biosciences and Oxford Nanopore technologies. The experimental results show that SLR can produce better results in terms of accuracy and completeness. The open-source code of SLR is available at https://github.com/luojunwei/SLR. CONCLUSION: In this paper, we describes SLR, which is designed to scaffold contigs using long reads. We conclude that SLR can improve the completeness of genome assembly. BioMed Central 2019-10-30 /pmc/articles/PMC6820941/ /pubmed/31666010 http://dx.doi.org/10.1186/s12859-019-3114-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Luo, Junwei
Lyu, Mengna
Chen, Ranran
Zhang, Xiaohong
Luo, Huimin
Yan, Chaokun
SLR: a scaffolding algorithm based on long reads and contig classification
title SLR: a scaffolding algorithm based on long reads and contig classification
title_full SLR: a scaffolding algorithm based on long reads and contig classification
title_fullStr SLR: a scaffolding algorithm based on long reads and contig classification
title_full_unstemmed SLR: a scaffolding algorithm based on long reads and contig classification
title_short SLR: a scaffolding algorithm based on long reads and contig classification
title_sort slr: a scaffolding algorithm based on long reads and contig classification
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6820941/
https://www.ncbi.nlm.nih.gov/pubmed/31666010
http://dx.doi.org/10.1186/s12859-019-3114-9
work_keys_str_mv AT luojunwei slrascaffoldingalgorithmbasedonlongreadsandcontigclassification
AT lyumengna slrascaffoldingalgorithmbasedonlongreadsandcontigclassification
AT chenranran slrascaffoldingalgorithmbasedonlongreadsandcontigclassification
AT zhangxiaohong slrascaffoldingalgorithmbasedonlongreadsandcontigclassification
AT luohuimin slrascaffoldingalgorithmbasedonlongreadsandcontigclassification
AT yanchaokun slrascaffoldingalgorithmbasedonlongreadsandcontigclassification