Cargando…

Random generation of RNA secondary structures according to native distributions

BACKGROUND: Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nebel, Markus E, Scheid, Anika, Weinberg, Frank
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3354341/ https://www.ncbi.nlm.nih.gov/pubmed/21992500 http://dx.doi.org/10.1186/1748-7188-6-24

_version_	1782233200660054016
author	Nebel, Markus E Scheid, Anika Weinberg, Frank
author_facet	Nebel, Markus E Scheid, Anika Weinberg, Frank
author_sort	Nebel, Markus E
collection	PubMed
description	BACKGROUND: Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest. RESULTS: In this article, we present a new general framework for deriving algorithms for the non-uniform random generation of combinatorial objects according to the encoding and probability distribution implied by a stochastic context-free grammar. Briefly, the framework extends on the well-known recursive method for (uniform) random generation and uses the popular framework of admissible specifications of combinatorial classes, introducing weighted combinatorial classes to allow for the non-uniform generation by means of unranking. This framework is used to derive an algorithm for the generation of RNA secondary structures of a given fixed size. We address the random generation of these structures according to a realistic distribution obtained from real-life data by using a very detailed context-free grammar (that models the class of RNA secondary structures by distinguishing between all known motifs in RNA structure). Compared to well-known sampling approaches used in several structure prediction tools (such as SFold) ours has two major advantages: Firstly, after a preprocessing step in time [Formula: see text] for the computation of all weighted class sizes needed, with our approach a set of m random secondary structures of a given structure size n can be computed in worst-case time complexity [Formula: see text] while other algorithms typically have a runtime in [Formula: see text]. Secondly, our approach works with integer arithmetic only which is faster and saves us from all the discomforting details of using floating point arithmetic with logarithmized probabilities. CONCLUSION: A number of experimental results shows that our random generation method produces realistic output, at least with respect to the appearance of the different structural motifs. The algorithm is available as a webservice at http://wwwagak.cs.uni-kl.de/NonUniRandGen and can be used for generating random secondary structures of any specified RNA type. A link to download an implementation of our method (in Wolfram Mathematica) can be found there, too.
format	Online Article Text
id	pubmed-3354341
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33543412012-05-18 Random generation of RNA secondary structures according to native distributions Nebel, Markus E Scheid, Anika Weinberg, Frank Algorithms Mol Biol Research BACKGROUND: Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest. RESULTS: In this article, we present a new general framework for deriving algorithms for the non-uniform random generation of combinatorial objects according to the encoding and probability distribution implied by a stochastic context-free grammar. Briefly, the framework extends on the well-known recursive method for (uniform) random generation and uses the popular framework of admissible specifications of combinatorial classes, introducing weighted combinatorial classes to allow for the non-uniform generation by means of unranking. This framework is used to derive an algorithm for the generation of RNA secondary structures of a given fixed size. We address the random generation of these structures according to a realistic distribution obtained from real-life data by using a very detailed context-free grammar (that models the class of RNA secondary structures by distinguishing between all known motifs in RNA structure). Compared to well-known sampling approaches used in several structure prediction tools (such as SFold) ours has two major advantages: Firstly, after a preprocessing step in time [Formula: see text] for the computation of all weighted class sizes needed, with our approach a set of m random secondary structures of a given structure size n can be computed in worst-case time complexity [Formula: see text] while other algorithms typically have a runtime in [Formula: see text]. Secondly, our approach works with integer arithmetic only which is faster and saves us from all the discomforting details of using floating point arithmetic with logarithmized probabilities. CONCLUSION: A number of experimental results shows that our random generation method produces realistic output, at least with respect to the appearance of the different structural motifs. The algorithm is available as a webservice at http://wwwagak.cs.uni-kl.de/NonUniRandGen and can be used for generating random secondary structures of any specified RNA type. A link to download an implementation of our method (in Wolfram Mathematica) can be found there, too. BioMed Central 2011-10-12 /pmc/articles/PMC3354341/ /pubmed/21992500 http://dx.doi.org/10.1186/1748-7188-6-24 Text en Copyright ©2011 Nebel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Nebel, Markus E Scheid, Anika Weinberg, Frank Random generation of RNA secondary structures according to native distributions
title	Random generation of RNA secondary structures according to native distributions
title_full	Random generation of RNA secondary structures according to native distributions
title_fullStr	Random generation of RNA secondary structures according to native distributions
title_full_unstemmed	Random generation of RNA secondary structures according to native distributions
title_short	Random generation of RNA secondary structures according to native distributions
title_sort	random generation of rna secondary structures according to native distributions
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3354341/ https://www.ncbi.nlm.nih.gov/pubmed/21992500 http://dx.doi.org/10.1186/1748-7188-6-24
work_keys_str_mv	AT nebelmarkuse randomgenerationofrnasecondarystructuresaccordingtonativedistributions AT scheidanika randomgenerationofrnasecondarystructuresaccordingtonativedistributions AT weinbergfrank randomgenerationofrnasecondarystructuresaccordingtonativedistributions

Random generation of RNA secondary structures according to native distributions

Ejemplares similares