Cargando…

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs

BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Otto, Christina, Möhl, Mathias, Heyne, Steffen, Amit, Mika, Landau, Gad M, Backofen, Rolf, Will, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302096/
https://www.ncbi.nlm.nih.gov/pubmed/25551362
http://dx.doi.org/10.1186/s12859-014-0404-0
_version_ 1782353736154218496
author Otto, Christina
Möhl, Mathias
Heyne, Steffen
Amit, Mika
Landau, Gad M
Backofen, Rolf
Will, Sebastian
author_facet Otto, Christina
Möhl, Mathias
Heyne, Steffen
Amit, Mika
Landau, Gad M
Backofen, Rolf
Will, Sebastian
author_sort Otto, Christina
collection PubMed
description BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. RESULTS: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P. CONCLUSIONS: ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0404-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4302096
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43020962015-02-03 ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs Otto, Christina Möhl, Mathias Heyne, Steffen Amit, Mika Landau, Gad M Backofen, Rolf Will, Sebastian BMC Bioinformatics Research Article BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. RESULTS: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P. CONCLUSIONS: ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0404-0) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-31 /pmc/articles/PMC4302096/ /pubmed/25551362 http://dx.doi.org/10.1186/s12859-014-0404-0 Text en © Otto et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Otto, Christina
Möhl, Mathias
Heyne, Steffen
Amit, Mika
Landau, Gad M
Backofen, Rolf
Will, Sebastian
ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
title ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
title_full ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
title_fullStr ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
title_full_unstemmed ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
title_short ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs
title_sort exparna-p: simultaneous exact pattern matching and folding of rnas
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302096/
https://www.ncbi.nlm.nih.gov/pubmed/25551362
http://dx.doi.org/10.1186/s12859-014-0404-0
work_keys_str_mv AT ottochristina exparnapsimultaneousexactpatternmatchingandfoldingofrnas
AT mohlmathias exparnapsimultaneousexactpatternmatchingandfoldingofrnas
AT heynesteffen exparnapsimultaneousexactpatternmatchingandfoldingofrnas
AT amitmika exparnapsimultaneousexactpatternmatchingandfoldingofrnas
AT landaugadm exparnapsimultaneousexactpatternmatchingandfoldingofrnas
AT backofenrolf exparnapsimultaneousexactpatternmatchingandfoldingofrnas
AT willsebastian exparnapsimultaneousexactpatternmatchingandfoldingofrnas