Cargando…
Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. Ther...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8147420/ https://www.ncbi.nlm.nih.gov/pubmed/34034751 http://dx.doi.org/10.1186/s13015-021-00182-9 |
_version_ | 1783697626645921792 |
---|---|
author | Mukherjee, Kingshuk Rossi, Massimiliano Salmela, Leena Boucher, Christina |
author_facet | Mukherjee, Kingshuk Rossi, Massimiliano Salmela, Leena Boucher, Christina |
author_sort | Mukherjee, Kingshuk |
collection | PubMed |
description | Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-021-00182-9. |
format | Online Article Text |
id | pubmed-8147420 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81474202021-05-26 Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph Mukherjee, Kingshuk Rossi, Massimiliano Salmela, Leena Boucher, Christina Algorithms Mol Biol Research Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-021-00182-9. BioMed Central 2021-05-25 /pmc/articles/PMC8147420/ /pubmed/34034751 http://dx.doi.org/10.1186/s13015-021-00182-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Mukherjee, Kingshuk Rossi, Massimiliano Salmela, Leena Boucher, Christina Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph |
title | Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph |
title_full | Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph |
title_fullStr | Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph |
title_full_unstemmed | Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph |
title_short | Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph |
title_sort | fast and efficient rmap assembly using the bi-labelled de bruijn graph |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8147420/ https://www.ncbi.nlm.nih.gov/pubmed/34034751 http://dx.doi.org/10.1186/s13015-021-00182-9 |
work_keys_str_mv | AT mukherjeekingshuk fastandefficientrmapassemblyusingthebilabelleddebruijngraph AT rossimassimiliano fastandefficientrmapassemblyusingthebilabelleddebruijngraph AT salmelaleena fastandefficientrmapassemblyusingthebilabelleddebruijngraph AT boucherchristina fastandefficientrmapassemblyusingthebilabelleddebruijngraph |