Cargando…

Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction

BACKGROUND: Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots,...

Descripción completa

Detalles Bibliográficos
Autores principales: Janssen, Stefan, Schudoma, Christian, Steger, Gerhard, Giegerich, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3293930/
https://www.ncbi.nlm.nih.gov/pubmed/22051375
http://dx.doi.org/10.1186/1471-2105-12-429
_version_ 1782225461951070208
author Janssen, Stefan
Schudoma, Christian
Steger, Gerhard
Giegerich, Robert
author_facet Janssen, Stefan
Schudoma, Christian
Steger, Gerhard
Giegerich, Robert
author_sort Janssen, Stefan
collection PubMed
description BACKGROUND: Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis. RESULTS: We extract four different models of the thermodynamic folding space which underlie the programs RNAFOLD, RNASHAPES, and RNASUBOPT. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences. CONCLUSIONS: We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development.
format Online
Article
Text
id pubmed-3293930
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32939302012-03-06 Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction Janssen, Stefan Schudoma, Christian Steger, Gerhard Giegerich, Robert BMC Bioinformatics Research Article BACKGROUND: Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis. RESULTS: We extract four different models of the thermodynamic folding space which underlie the programs RNAFOLD, RNASHAPES, and RNASUBOPT. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences. CONCLUSIONS: We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development. BioMed Central 2011-11-03 /pmc/articles/PMC3293930/ /pubmed/22051375 http://dx.doi.org/10.1186/1471-2105-12-429 Text en Copyright ©2011 Janssen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Janssen, Stefan
Schudoma, Christian
Steger, Gerhard
Giegerich, Robert
Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction
title Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction
title_full Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction
title_fullStr Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction
title_full_unstemmed Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction
title_short Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction
title_sort lost in folding space? comparing four variants of the thermodynamic model for rna secondary structure prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3293930/
https://www.ncbi.nlm.nih.gov/pubmed/22051375
http://dx.doi.org/10.1186/1471-2105-12-429
work_keys_str_mv AT janssenstefan lostinfoldingspacecomparingfourvariantsofthethermodynamicmodelforrnasecondarystructureprediction
AT schudomachristian lostinfoldingspacecomparingfourvariantsofthethermodynamicmodelforrnasecondarystructureprediction
AT stegergerhard lostinfoldingspacecomparingfourvariantsofthethermodynamicmodelforrnasecondarystructureprediction
AT giegerichrobert lostinfoldingspacecomparingfourvariantsofthethermodynamicmodelforrnasecondarystructureprediction