Cargando…
The omnitig framework can improve genome assembly contiguity in practice
Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In thi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915519/ https://www.ncbi.nlm.nih.gov/pubmed/36778435 http://dx.doi.org/10.1101/2023.01.30.526175 |
_version_ | 1784885919110660096 |
---|---|
author | Schmidt, Sebastian Toivonen, Santeri Medvedev, Paul Tomescu, Alexandru I. |
author_facet | Schmidt, Sebastian Toivonen, Santeri Medvedev, Paul Tomescu, Alexandru I. |
author_sort | Schmidt, Sebastian |
collection | PubMed |
description | Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs, giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the Drosophilia melanogaster and the Caenorhabditis elegans genome. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible computational costs and either no or a small increase in the number of misassemblies. |
format | Online Article Text |
id | pubmed-9915519 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-99155192023-02-11 The omnitig framework can improve genome assembly contiguity in practice Schmidt, Sebastian Toivonen, Santeri Medvedev, Paul Tomescu, Alexandru I. bioRxiv Article Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs, giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the Drosophilia melanogaster and the Caenorhabditis elegans genome. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible computational costs and either no or a small increase in the number of misassemblies. Cold Spring Harbor Laboratory 2023-02-02 /pmc/articles/PMC9915519/ /pubmed/36778435 http://dx.doi.org/10.1101/2023.01.30.526175 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Schmidt, Sebastian Toivonen, Santeri Medvedev, Paul Tomescu, Alexandru I. The omnitig framework can improve genome assembly contiguity in practice |
title | The omnitig framework can improve genome assembly contiguity in practice |
title_full | The omnitig framework can improve genome assembly contiguity in practice |
title_fullStr | The omnitig framework can improve genome assembly contiguity in practice |
title_full_unstemmed | The omnitig framework can improve genome assembly contiguity in practice |
title_short | The omnitig framework can improve genome assembly contiguity in practice |
title_sort | omnitig framework can improve genome assembly contiguity in practice |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915519/ https://www.ncbi.nlm.nih.gov/pubmed/36778435 http://dx.doi.org/10.1101/2023.01.30.526175 |
work_keys_str_mv | AT schmidtsebastian theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT toivonensanteri theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT medvedevpaul theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT tomescualexandrui theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT schmidtsebastian omnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT toivonensanteri omnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT medvedevpaul omnitigframeworkcanimprovegenomeassemblycontiguityinpractice AT tomescualexandrui omnitigframeworkcanimprovegenomeassemblycontiguityinpractice |