Cargando…

On the combinatorics of sparsification

BACKGROUND: We study the sparsification of dynamic programming based on folding algorithms of RNA structures. Sparsification is a method that improves significantly the computation of minimum free energy (mfe) RNA structures. RESULTS: We provide a quantitative analysis of the sparsification of a par...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Fenix WD, Reidys, Christian M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549849/
https://www.ncbi.nlm.nih.gov/pubmed/23088372
http://dx.doi.org/10.1186/1748-7188-7-28
_version_ 1782256484392894464
author Huang, Fenix WD
Reidys, Christian M
author_facet Huang, Fenix WD
Reidys, Christian M
author_sort Huang, Fenix WD
collection PubMed
description BACKGROUND: We study the sparsification of dynamic programming based on folding algorithms of RNA structures. Sparsification is a method that improves significantly the computation of minimum free energy (mfe) RNA structures. RESULTS: We provide a quantitative analysis of the sparsification of a particular decomposition rule, Λ(∗). This rule splits an interval of RNA secondary and pseudoknot structures of fixed topological genus. Key for quantifying sparsifications is the size of the so called candidate sets. Here we assume mfe-structures to be specifically distributed (see Assumption 1) within arbitrary and irreducible RNA secondary and pseudoknot structures of fixed topological genus. We then present a combinatorial framework which allows by means of probabilities of irreducible sub-structures to obtain the expectation of the Λ(∗)-candidate set w.r.t. a uniformly random input sequence. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) in case of RNA secondary structures as well as RNA pseudoknot structures. Furthermore, for RNA secondary structures we also analyze a simplified loop-based energy model. Our combinatorial analysis is then compared to the expected number of Λ(∗)-candidates obtained from the folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop-based energy model our results imply that sparsification provides a significant, constant improvement of 91% (theory) to be compared to an 96% (experimental, simplified arc-based model) reduction. However, we do not observe a linear factor improvement. Finally, in case of the “full” loop-energy model we can report a reduction of 98% (experiment). CONCLUSIONS: Sparsification was initially attributed a linear factor improvement. This conclusion was based on the so called polymer-zeta property, which stems from interpreting polymer chains as self-avoiding walks. Subsequent findings however reveal that the O(n) improvement is not correct. The combinatorial analysis presented here shows that, assuming a specific distribution (see Assumption 1), of mfe-structures within irreducible and arbitrary structures, the expected number of Λ(∗)-candidates is Θ(n(2)). However, the constant reduction is quite significant, being in the range of 96%. We furthermore show an analogous result for the sparsification of the Λ(∗)-decomposition rule for RNA pseudoknotted structures of genus one. Finally we observe that the effect of sparsification is sensitive to the employed energy model.
format Online
Article
Text
id pubmed-3549849
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35498492013-01-23 On the combinatorics of sparsification Huang, Fenix WD Reidys, Christian M Algorithms Mol Biol Research BACKGROUND: We study the sparsification of dynamic programming based on folding algorithms of RNA structures. Sparsification is a method that improves significantly the computation of minimum free energy (mfe) RNA structures. RESULTS: We provide a quantitative analysis of the sparsification of a particular decomposition rule, Λ(∗). This rule splits an interval of RNA secondary and pseudoknot structures of fixed topological genus. Key for quantifying sparsifications is the size of the so called candidate sets. Here we assume mfe-structures to be specifically distributed (see Assumption 1) within arbitrary and irreducible RNA secondary and pseudoknot structures of fixed topological genus. We then present a combinatorial framework which allows by means of probabilities of irreducible sub-structures to obtain the expectation of the Λ(∗)-candidate set w.r.t. a uniformly random input sequence. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) in case of RNA secondary structures as well as RNA pseudoknot structures. Furthermore, for RNA secondary structures we also analyze a simplified loop-based energy model. Our combinatorial analysis is then compared to the expected number of Λ(∗)-candidates obtained from the folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop-based energy model our results imply that sparsification provides a significant, constant improvement of 91% (theory) to be compared to an 96% (experimental, simplified arc-based model) reduction. However, we do not observe a linear factor improvement. Finally, in case of the “full” loop-energy model we can report a reduction of 98% (experiment). CONCLUSIONS: Sparsification was initially attributed a linear factor improvement. This conclusion was based on the so called polymer-zeta property, which stems from interpreting polymer chains as self-avoiding walks. Subsequent findings however reveal that the O(n) improvement is not correct. The combinatorial analysis presented here shows that, assuming a specific distribution (see Assumption 1), of mfe-structures within irreducible and arbitrary structures, the expected number of Λ(∗)-candidates is Θ(n(2)). However, the constant reduction is quite significant, being in the range of 96%. We furthermore show an analogous result for the sparsification of the Λ(∗)-decomposition rule for RNA pseudoknotted structures of genus one. Finally we observe that the effect of sparsification is sensitive to the employed energy model. BioMed Central 2012-10-22 /pmc/articles/PMC3549849/ /pubmed/23088372 http://dx.doi.org/10.1186/1748-7188-7-28 Text en Copyright © 2012 Huang and Reidys; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Huang, Fenix WD
Reidys, Christian M
On the combinatorics of sparsification
title On the combinatorics of sparsification
title_full On the combinatorics of sparsification
title_fullStr On the combinatorics of sparsification
title_full_unstemmed On the combinatorics of sparsification
title_short On the combinatorics of sparsification
title_sort on the combinatorics of sparsification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549849/
https://www.ncbi.nlm.nih.gov/pubmed/23088372
http://dx.doi.org/10.1186/1748-7188-7-28
work_keys_str_mv AT huangfenixwd onthecombinatoricsofsparsification
AT reidyschristianm onthecombinatoricsofsparsification