Cargando…

NoFold: RNA structure clustering without folding or alignment

Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcri...

Descripción completa

Detalles Bibliográficos
Autores principales:	Middleton, Sarah A., Kim, Junhyong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory Press 2014
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201820/ https://www.ncbi.nlm.nih.gov/pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113

_version_	1782340235128995840
author	Middleton, Sarah A. Kim, Junhyong
author_facet	Middleton, Sarah A. Kim, Junhyong
author_sort	Middleton, Sarah A.
collection	PubMed
description	Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures.
format	Online Article Text
id	pubmed-4201820
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Cold Spring Harbor Laboratory Press
record_format	MEDLINE/PubMed
spelling	pubmed-42018202015-11-01 NoFold: RNA structure clustering without folding or alignment Middleton, Sarah A. Kim, Junhyong RNA Bioinformatics Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function—for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. Cold Spring Harbor Laboratory Press 2014-11 /pmc/articles/PMC4201820/ /pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113 Text en © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle	Bioinformatics Middleton, Sarah A. Kim, Junhyong NoFold: RNA structure clustering without folding or alignment
title	NoFold: RNA structure clustering without folding or alignment
title_full	NoFold: RNA structure clustering without folding or alignment
title_fullStr	NoFold: RNA structure clustering without folding or alignment
title_full_unstemmed	NoFold: RNA structure clustering without folding or alignment
title_short	NoFold: RNA structure clustering without folding or alignment
title_sort	nofold: rna structure clustering without folding or alignment
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201820/ https://www.ncbi.nlm.nih.gov/pubmed/25234928 http://dx.doi.org/10.1261/rna.041913.113
work_keys_str_mv	AT middletonsaraha nofoldrnastructureclusteringwithoutfoldingoralignment AT kimjunhyong nofoldrnastructureclusteringwithoutfoldingoralignment

NoFold: RNA structure clustering without folding or alignment

Ejemplares similares