Cargando…
ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predict...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302096/ https://www.ncbi.nlm.nih.gov/pubmed/25551362 http://dx.doi.org/10.1186/s12859-014-0404-0 |
_version_ | 1782353736154218496 |
---|---|
author | Otto, Christina Möhl, Mathias Heyne, Steffen Amit, Mika Landau, Gad M Backofen, Rolf Will, Sebastian |
author_facet | Otto, Christina Möhl, Mathias Heyne, Steffen Amit, Mika Landau, Gad M Backofen, Rolf Will, Sebastian |
author_sort | Otto, Christina |
collection | PubMed |
description | BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. RESULTS: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P. CONCLUSIONS: ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0404-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4302096 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43020962015-02-03 ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs Otto, Christina Möhl, Mathias Heyne, Steffen Amit, Mika Landau, Gad M Backofen, Rolf Will, Sebastian BMC Bioinformatics Research Article BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. RESULTS: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P. CONCLUSIONS: ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0404-0) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-31 /pmc/articles/PMC4302096/ /pubmed/25551362 http://dx.doi.org/10.1186/s12859-014-0404-0 Text en © Otto et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Otto, Christina Möhl, Mathias Heyne, Steffen Amit, Mika Landau, Gad M Backofen, Rolf Will, Sebastian ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs |
title | ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs |
title_full | ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs |
title_fullStr | ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs |
title_full_unstemmed | ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs |
title_short | ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs |
title_sort | exparna-p: simultaneous exact pattern matching and folding of rnas |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302096/ https://www.ncbi.nlm.nih.gov/pubmed/25551362 http://dx.doi.org/10.1186/s12859-014-0404-0 |
work_keys_str_mv | AT ottochristina exparnapsimultaneousexactpatternmatchingandfoldingofrnas AT mohlmathias exparnapsimultaneousexactpatternmatchingandfoldingofrnas AT heynesteffen exparnapsimultaneousexactpatternmatchingandfoldingofrnas AT amitmika exparnapsimultaneousexactpatternmatchingandfoldingofrnas AT landaugadm exparnapsimultaneousexactpatternmatchingandfoldingofrnas AT backofenrolf exparnapsimultaneousexactpatternmatchingandfoldingofrnas AT willsebastian exparnapsimultaneousexactpatternmatchingandfoldingofrnas |