Cargando…

Superbubbles revisited

BACKGROUND: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thu...

Descripción completa

Detalles Bibliográficos
Autores principales: Gärtner, Fabian, Müller, Lydia, Stadler, Peter F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648/
https://www.ncbi.nlm.nih.gov/pubmed/30519278
http://dx.doi.org/10.1186/s13015-018-0134-3
_version_ 1783376975662940160
author Gärtner, Fabian
Müller, Lydia
Stadler, Peter F.
author_facet Gärtner, Fabian
Müller, Lydia
Stadler, Peter F.
author_sort Gärtner, Fabian
collection PubMed
description BACKGROUND: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770–777, 2015) achieves this task in [Formula: see text] -time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374–383, 2016) resulting in an overall [Formula: see text] -time algorithm. RESULTS: A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient [Formula: see text] -time algorithm for detecting superbubbles in DAGs that uses only simple data structures. IMPLEMENTATION: We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble.
format Online
Article
Text
id pubmed-6271648
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62716482018-12-05 Superbubbles revisited Gärtner, Fabian Müller, Lydia Stadler, Peter F. Algorithms Mol Biol Research BACKGROUND: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770–777, 2015) achieves this task in [Formula: see text] -time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374–383, 2016) resulting in an overall [Formula: see text] -time algorithm. RESULTS: A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient [Formula: see text] -time algorithm for detecting superbubbles in DAGs that uses only simple data structures. IMPLEMENTATION: We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble. BioMed Central 2018-12-01 /pmc/articles/PMC6271648/ /pubmed/30519278 http://dx.doi.org/10.1186/s13015-018-0134-3 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Gärtner, Fabian
Müller, Lydia
Stadler, Peter F.
Superbubbles revisited
title Superbubbles revisited
title_full Superbubbles revisited
title_fullStr Superbubbles revisited
title_full_unstemmed Superbubbles revisited
title_short Superbubbles revisited
title_sort superbubbles revisited
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648/
https://www.ncbi.nlm.nih.gov/pubmed/30519278
http://dx.doi.org/10.1186/s13015-018-0134-3
work_keys_str_mv AT gartnerfabian superbubblesrevisited
AT mullerlydia superbubblesrevisited
AT stadlerpeterf superbubblesrevisited