Cargando…
Superbubbles revisited
BACKGROUND: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thu...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648/ https://www.ncbi.nlm.nih.gov/pubmed/30519278 http://dx.doi.org/10.1186/s13015-018-0134-3 |
_version_ | 1783376975662940160 |
---|---|
author | Gärtner, Fabian Müller, Lydia Stadler, Peter F. |
author_facet | Gärtner, Fabian Müller, Lydia Stadler, Peter F. |
author_sort | Gärtner, Fabian |
collection | PubMed |
description | BACKGROUND: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770–777, 2015) achieves this task in [Formula: see text] -time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374–383, 2016) resulting in an overall [Formula: see text] -time algorithm. RESULTS: A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient [Formula: see text] -time algorithm for detecting superbubbles in DAGs that uses only simple data structures. IMPLEMENTATION: We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble. |
format | Online Article Text |
id | pubmed-6271648 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62716482018-12-05 Superbubbles revisited Gärtner, Fabian Müller, Lydia Stadler, Peter F. Algorithms Mol Biol Research BACKGROUND: Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770–777, 2015) achieves this task in [Formula: see text] -time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374–383, 2016) resulting in an overall [Formula: see text] -time algorithm. RESULTS: A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient [Formula: see text] -time algorithm for detecting superbubbles in DAGs that uses only simple data structures. IMPLEMENTATION: We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble. BioMed Central 2018-12-01 /pmc/articles/PMC6271648/ /pubmed/30519278 http://dx.doi.org/10.1186/s13015-018-0134-3 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Gärtner, Fabian Müller, Lydia Stadler, Peter F. Superbubbles revisited |
title | Superbubbles revisited |
title_full | Superbubbles revisited |
title_fullStr | Superbubbles revisited |
title_full_unstemmed | Superbubbles revisited |
title_short | Superbubbles revisited |
title_sort | superbubbles revisited |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6271648/ https://www.ncbi.nlm.nih.gov/pubmed/30519278 http://dx.doi.org/10.1186/s13015-018-0134-3 |
work_keys_str_mv | AT gartnerfabian superbubblesrevisited AT mullerlydia superbubblesrevisited AT stadlerpeterf superbubblesrevisited |