Cargando…

Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. Ther...

Descripción completa

Detalles Bibliográficos
Autores principales: Mukherjee, Kingshuk, Rossi, Massimiliano, Salmela, Leena, Boucher, Christina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8147420/
https://www.ncbi.nlm.nih.gov/pubmed/34034751
http://dx.doi.org/10.1186/s13015-021-00182-9
_version_ 1783697626645921792
author Mukherjee, Kingshuk
Rossi, Massimiliano
Salmela, Leena
Boucher, Christina
author_facet Mukherjee, Kingshuk
Rossi, Massimiliano
Salmela, Leena
Boucher, Christina
author_sort Mukherjee, Kingshuk
collection PubMed
description Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-021-00182-9.
format Online
Article
Text
id pubmed-8147420
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81474202021-05-26 Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph Mukherjee, Kingshuk Rossi, Massimiliano Salmela, Leena Boucher, Christina Algorithms Mol Biol Research Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-021-00182-9. BioMed Central 2021-05-25 /pmc/articles/PMC8147420/ /pubmed/34034751 http://dx.doi.org/10.1186/s13015-021-00182-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Mukherjee, Kingshuk
Rossi, Massimiliano
Salmela, Leena
Boucher, Christina
Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
title Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
title_full Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
title_fullStr Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
title_full_unstemmed Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
title_short Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
title_sort fast and efficient rmap assembly using the bi-labelled de bruijn graph
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8147420/
https://www.ncbi.nlm.nih.gov/pubmed/34034751
http://dx.doi.org/10.1186/s13015-021-00182-9
work_keys_str_mv AT mukherjeekingshuk fastandefficientrmapassemblyusingthebilabelleddebruijngraph
AT rossimassimiliano fastandefficientrmapassemblyusingthebilabelleddebruijngraph
AT salmelaleena fastandefficientrmapassemblyusingthebilabelleddebruijngraph
AT boucherchristina fastandefficientrmapassemblyusingthebilabelleddebruijngraph