Cargando…
NoFold: RNA structure clustering without folding or alignment
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcri...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201820/ https://www.ncbi.nlm.nih.gov/pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113 |
_version_ | 1782340235128995840 |
---|---|
author | Middleton, Sarah A. Kim, Junhyong |
author_facet | Middleton, Sarah A. Kim, Junhyong |
author_sort | Middleton, Sarah A. |
collection | PubMed |
description | Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. |
format | Online Article Text |
id | pubmed-4201820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-42018202015-11-01 NoFold: RNA structure clustering without folding or alignment Middleton, Sarah A. Kim, Junhyong RNA Bioinformatics Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. Cold Spring Harbor Laboratory Press 2014-11 /pmc/articles/PMC4201820/ /pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113 Text en © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Bioinformatics Middleton, Sarah A. Kim, Junhyong NoFold: RNA structure clustering without folding or alignment |
title | NoFold: RNA structure clustering without folding or alignment |
title_full | NoFold: RNA structure clustering without folding or alignment |
title_fullStr | NoFold: RNA structure clustering without folding or alignment |
title_full_unstemmed | NoFold: RNA structure clustering without folding or alignment |
title_short | NoFold: RNA structure clustering without folding or alignment |
title_sort | nofold: rna structure clustering without folding or alignment |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201820/ https://www.ncbi.nlm.nih.gov/pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113 |
work_keys_str_mv | AT middletonsaraha nofoldrnastructureclusteringwithoutfoldingoralignment AT kimjunhyong nofoldrnastructureclusteringwithoutfoldingoralignment |