Cargando…

NoFold: RNA structure clustering without folding or alignment

Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcri...

Descripción completa

Detalles Bibliográficos
Autores principales: Middleton, Sarah A., Kim, Junhyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201820/
https://www.ncbi.nlm.nih.gov/pubmed/25234928
http://dx.doi.org/10.1261/rna.041913.113
_version_ 1782340235128995840
author Middleton, Sarah A.
Kim, Junhyong
author_facet Middleton, Sarah A.
Kim, Junhyong
author_sort Middleton, Sarah A.
collection PubMed
description Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures.
format Online
Article
Text
id pubmed-4201820
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-42018202015-11-01 NoFold: RNA structure clustering without folding or alignment Middleton, Sarah A. Kim, Junhyong RNA Bioinformatics Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. Cold Spring Harbor Laboratory Press 2014-11 /pmc/articles/PMC4201820/ /pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113 Text en © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Bioinformatics
Middleton, Sarah A.
Kim, Junhyong
NoFold: RNA structure clustering without folding or alignment
title NoFold: RNA structure clustering without folding or alignment
title_full NoFold: RNA structure clustering without folding or alignment
title_fullStr NoFold: RNA structure clustering without folding or alignment
title_full_unstemmed NoFold: RNA structure clustering without folding or alignment
title_short NoFold: RNA structure clustering without folding or alignment
title_sort nofold: rna structure clustering without folding or alignment
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201820/
https://www.ncbi.nlm.nih.gov/pubmed/25234928
http://dx.doi.org/10.1261/rna.041913.113
work_keys_str_mv AT middletonsaraha nofoldrnastructureclusteringwithoutfoldingoralignment
AT kimjunhyong nofoldrnastructureclusteringwithoutfoldingoralignment