Cargando…

Structator: fast index-based search for RNA sequence-structure patterns

BACKGROUND: The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the be...

Descripción completa

Detalles Bibliográficos
Autores principales: Meyer, Fernando, Kurtz, Stefan, Backofen, Rolf, Will, Sebastian, Beckstette, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3154205/
https://www.ncbi.nlm.nih.gov/pubmed/21619640
http://dx.doi.org/10.1186/1471-2105-12-214
_version_ 1782209995238014976
author Meyer, Fernando
Kurtz, Stefan
Backofen, Rolf
Will, Sebastian
Beckstette, Michael
author_facet Meyer, Fernando
Kurtz, Stefan
Backofen, Rolf
Will, Sebastian
Beckstette, Michael
author_sort Meyer, Fernando
collection PubMed
description BACKGROUND: The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS: We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS: The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.
format Online
Article
Text
id pubmed-3154205
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31542052011-08-11 Structator: fast index-based search for RNA sequence-structure patterns Meyer, Fernando Kurtz, Stefan Backofen, Rolf Will, Sebastian Beckstette, Michael BMC Bioinformatics Methodology Article BACKGROUND: The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS: We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS: The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. BioMed Central 2011-05-27 /pmc/articles/PMC3154205/ /pubmed/21619640 http://dx.doi.org/10.1186/1471-2105-12-214 Text en Copyright ©2011 Meyer et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Meyer, Fernando
Kurtz, Stefan
Backofen, Rolf
Will, Sebastian
Beckstette, Michael
Structator: fast index-based search for RNA sequence-structure patterns
title Structator: fast index-based search for RNA sequence-structure patterns
title_full Structator: fast index-based search for RNA sequence-structure patterns
title_fullStr Structator: fast index-based search for RNA sequence-structure patterns
title_full_unstemmed Structator: fast index-based search for RNA sequence-structure patterns
title_short Structator: fast index-based search for RNA sequence-structure patterns
title_sort structator: fast index-based search for rna sequence-structure patterns
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3154205/
https://www.ncbi.nlm.nih.gov/pubmed/21619640
http://dx.doi.org/10.1186/1471-2105-12-214
work_keys_str_mv AT meyerfernando structatorfastindexbasedsearchforrnasequencestructurepatterns
AT kurtzstefan structatorfastindexbasedsearchforrnasequencestructurepatterns
AT backofenrolf structatorfastindexbasedsearchforrnasequencestructurepatterns
AT willsebastian structatorfastindexbasedsearchforrnasequencestructurepatterns
AT beckstettemichael structatorfastindexbasedsearchforrnasequencestructurepatterns