Cargando…

The omnitig framework can improve genome assembly contiguity in practice

Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmidt, Sebastian, Toivonen, Santeri, Medvedev, Paul, Tomescu, Alexandru I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915519/
https://www.ncbi.nlm.nih.gov/pubmed/36778435
http://dx.doi.org/10.1101/2023.01.30.526175
_version_ 1784885919110660096
author Schmidt, Sebastian
Toivonen, Santeri
Medvedev, Paul
Tomescu, Alexandru I.
author_facet Schmidt, Sebastian
Toivonen, Santeri
Medvedev, Paul
Tomescu, Alexandru I.
author_sort Schmidt, Sebastian
collection PubMed
description Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs, giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the Drosophilia melanogaster and the Caenorhabditis elegans genome. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible computational costs and either no or a small increase in the number of misassemblies.
format Online
Article
Text
id pubmed-9915519
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-99155192023-02-11 The omnitig framework can improve genome assembly contiguity in practice Schmidt, Sebastian Toivonen, Santeri Medvedev, Paul Tomescu, Alexandru I. bioRxiv Article Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs, giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the Drosophilia melanogaster and the Caenorhabditis elegans genome. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible computational costs and either no or a small increase in the number of misassemblies. Cold Spring Harbor Laboratory 2023-02-02 /pmc/articles/PMC9915519/ /pubmed/36778435 http://dx.doi.org/10.1101/2023.01.30.526175 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Schmidt, Sebastian
Toivonen, Santeri
Medvedev, Paul
Tomescu, Alexandru I.
The omnitig framework can improve genome assembly contiguity in practice
title The omnitig framework can improve genome assembly contiguity in practice
title_full The omnitig framework can improve genome assembly contiguity in practice
title_fullStr The omnitig framework can improve genome assembly contiguity in practice
title_full_unstemmed The omnitig framework can improve genome assembly contiguity in practice
title_short The omnitig framework can improve genome assembly contiguity in practice
title_sort omnitig framework can improve genome assembly contiguity in practice
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915519/
https://www.ncbi.nlm.nih.gov/pubmed/36778435
http://dx.doi.org/10.1101/2023.01.30.526175
work_keys_str_mv AT schmidtsebastian theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT toivonensanteri theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT medvedevpaul theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT tomescualexandrui theomnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT schmidtsebastian omnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT toivonensanteri omnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT medvedevpaul omnitigframeworkcanimprovegenomeassemblycontiguityinpractice
AT tomescualexandrui omnitigframeworkcanimprovegenomeassemblycontiguityinpractice