Cargando…

Using the longest run subsequence problem within homology-based scaffolding

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related specie...

Descripción completa

Detalles Bibliográficos
Autores principales: Schrinner, Sven, Goel, Manish, Wulfert, Michael, Spohr, Philipp, Schneeberger, Korbinian, Klau, Gunnar W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273/
https://www.ncbi.nlm.nih.gov/pubmed/34183036
http://dx.doi.org/10.1186/s13015-021-00191-8
_version_ 1783715181205913600
author Schrinner, Sven
Goel, Manish
Wulfert, Michael
Spohr, Philipp
Schneeberger, Korbinian
Klau, Gunnar W.
author_facet Schrinner, Sven
Goel, Manish
Wulfert, Michael
Spohr, Philipp
Schneeberger, Korbinian
Klau, Gunnar W.
author_sort Schrinner, Sven
collection PubMed
description Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.
format Online
Article
Text
id pubmed-8240273
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82402732021-06-29 Using the longest run subsequence problem within homology-based scaffolding Schrinner, Sven Goel, Manish Wulfert, Michael Spohr, Philipp Schneeberger, Korbinian Klau, Gunnar W. Algorithms Mol Biol Research Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time. BioMed Central 2021-06-28 /pmc/articles/PMC8240273/ /pubmed/34183036 http://dx.doi.org/10.1186/s13015-021-00191-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Schrinner, Sven
Goel, Manish
Wulfert, Michael
Spohr, Philipp
Schneeberger, Korbinian
Klau, Gunnar W.
Using the longest run subsequence problem within homology-based scaffolding
title Using the longest run subsequence problem within homology-based scaffolding
title_full Using the longest run subsequence problem within homology-based scaffolding
title_fullStr Using the longest run subsequence problem within homology-based scaffolding
title_full_unstemmed Using the longest run subsequence problem within homology-based scaffolding
title_short Using the longest run subsequence problem within homology-based scaffolding
title_sort using the longest run subsequence problem within homology-based scaffolding
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273/
https://www.ncbi.nlm.nih.gov/pubmed/34183036
http://dx.doi.org/10.1186/s13015-021-00191-8
work_keys_str_mv AT schrinnersven usingthelongestrunsubsequenceproblemwithinhomologybasedscaffolding
AT goelmanish usingthelongestrunsubsequenceproblemwithinhomologybasedscaffolding
AT wulfertmichael usingthelongestrunsubsequenceproblemwithinhomologybasedscaffolding
AT spohrphilipp usingthelongestrunsubsequenceproblemwithinhomologybasedscaffolding
AT schneebergerkorbinian usingthelongestrunsubsequenceproblemwithinhomologybasedscaffolding
AT klaugunnarw usingthelongestrunsubsequenceproblemwithinhomologybasedscaffolding