Cargando…

Reference-based read clustering improves the de novo genome assembly of microbial strains

Constructing accurate microbial genome assemblies is necessary to understand genetic diversity in microbial genomes and its functional consequences. However, it still remains as a challenging task especially when only short-read sequencing technologies are used. Here, we present a new read-clusterin...

Descripción completa

Detalles Bibliográficos
Autores principales: Sim, Mikang, Lee, Jongin, Kwon, Daehong, Lee, Daehwan, Park, Nayoung, Wy, Suyeon, Ko, Younhee, Kim, Jaebum
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9804104/
https://www.ncbi.nlm.nih.gov/pubmed/36618978
http://dx.doi.org/10.1016/j.csbj.2022.12.032
_version_ 1784862030104100864
author Sim, Mikang
Lee, Jongin
Kwon, Daehong
Lee, Daehwan
Park, Nayoung
Wy, Suyeon
Ko, Younhee
Kim, Jaebum
author_facet Sim, Mikang
Lee, Jongin
Kwon, Daehong
Lee, Daehwan
Park, Nayoung
Wy, Suyeon
Ko, Younhee
Kim, Jaebum
author_sort Sim, Mikang
collection PubMed
description Constructing accurate microbial genome assemblies is necessary to understand genetic diversity in microbial genomes and its functional consequences. However, it still remains as a challenging task especially when only short-read sequencing technologies are used. Here, we present a new read-clustering algorithm, called RBRC, for improving de novo microbial genome assembly, by accurately estimating read proximity using multiple reference genomes. The performance of RBRC was confirmed by simulation-based evaluation in terms of assembly contiguity and the number of misassemblies, and was successfully applied to existing fungal and bacterial genomes by improving the quality of the assemblies without using additional sequencing data. RBRC is a very useful read-clustering algorithm that can be used (i) for generating high-quality genome assemblies of microbial strains when genome assemblies of related strains are available, and (ii) for upgrading existing microbial genome assemblies when the generation of additional sequencing data, such as long reads, is difficult.
format Online
Article
Text
id pubmed-9804104
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-98041042023-01-05 Reference-based read clustering improves the de novo genome assembly of microbial strains Sim, Mikang Lee, Jongin Kwon, Daehong Lee, Daehwan Park, Nayoung Wy, Suyeon Ko, Younhee Kim, Jaebum Comput Struct Biotechnol J Research Article Constructing accurate microbial genome assemblies is necessary to understand genetic diversity in microbial genomes and its functional consequences. However, it still remains as a challenging task especially when only short-read sequencing technologies are used. Here, we present a new read-clustering algorithm, called RBRC, for improving de novo microbial genome assembly, by accurately estimating read proximity using multiple reference genomes. The performance of RBRC was confirmed by simulation-based evaluation in terms of assembly contiguity and the number of misassemblies, and was successfully applied to existing fungal and bacterial genomes by improving the quality of the assemblies without using additional sequencing data. RBRC is a very useful read-clustering algorithm that can be used (i) for generating high-quality genome assemblies of microbial strains when genome assemblies of related strains are available, and (ii) for upgrading existing microbial genome assemblies when the generation of additional sequencing data, such as long reads, is difficult. Research Network of Computational and Structural Biotechnology 2022-12-21 /pmc/articles/PMC9804104/ /pubmed/36618978 http://dx.doi.org/10.1016/j.csbj.2022.12.032 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Sim, Mikang
Lee, Jongin
Kwon, Daehong
Lee, Daehwan
Park, Nayoung
Wy, Suyeon
Ko, Younhee
Kim, Jaebum
Reference-based read clustering improves the de novo genome assembly of microbial strains
title Reference-based read clustering improves the de novo genome assembly of microbial strains
title_full Reference-based read clustering improves the de novo genome assembly of microbial strains
title_fullStr Reference-based read clustering improves the de novo genome assembly of microbial strains
title_full_unstemmed Reference-based read clustering improves the de novo genome assembly of microbial strains
title_short Reference-based read clustering improves the de novo genome assembly of microbial strains
title_sort reference-based read clustering improves the de novo genome assembly of microbial strains
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9804104/
https://www.ncbi.nlm.nih.gov/pubmed/36618978
http://dx.doi.org/10.1016/j.csbj.2022.12.032
work_keys_str_mv AT simmikang referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT leejongin referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT kwondaehong referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT leedaehwan referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT parknayoung referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT wysuyeon referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT koyounhee referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains
AT kimjaebum referencebasedreadclusteringimprovesthedenovogenomeassemblyofmicrobialstrains