Cargando…

Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering

The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist o...

Descripción completa

Detalles Bibliográficos
Autores principales: Will, Sebastian, Reiche, Kristin, Hofacker, Ivo L, Stadler, Peter F, Backofen, Rolf
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1851984/
https://www.ncbi.nlm.nih.gov/pubmed/17432929
http://dx.doi.org/10.1371/journal.pcbi.0030065
_version_ 1782133004362055680
author Will, Sebastian
Reiche, Kristin
Hofacker, Ivo L
Stadler, Peter F
Backofen, Rolf
author_facet Will, Sebastian
Reiche, Kristin
Hofacker, Ivo L
Stadler, Peter F
Backofen, Rolf
author_sort Will, Sebastian
collection PubMed
description The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77–i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized.
format Text
id pubmed-1851984
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18519842007-04-13 Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering Will, Sebastian Reiche, Kristin Hofacker, Ivo L Stadler, Peter F Backofen, Rolf PLoS Comput Biol Research Article The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77–i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized. Public Library of Science 2007-04 2007-04-13 /pmc/articles/PMC1851984/ /pubmed/17432929 http://dx.doi.org/10.1371/journal.pcbi.0030065 Text en © 2007 Will et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Will, Sebastian
Reiche, Kristin
Hofacker, Ivo L
Stadler, Peter F
Backofen, Rolf
Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering
title Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering
title_full Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering
title_fullStr Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering
title_full_unstemmed Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering
title_short Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering
title_sort inferring noncoding rna families and classes by means of genome-scale structure-based clustering
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1851984/
https://www.ncbi.nlm.nih.gov/pubmed/17432929
http://dx.doi.org/10.1371/journal.pcbi.0030065
work_keys_str_mv AT willsebastian inferringnoncodingrnafamiliesandclassesbymeansofgenomescalestructurebasedclustering
AT reichekristin inferringnoncodingrnafamiliesandclassesbymeansofgenomescalestructurebasedclustering
AT hofackerivol inferringnoncodingrnafamiliesandclassesbymeansofgenomescalestructurebasedclustering
AT stadlerpeterf inferringnoncodingrnafamiliesandclassesbymeansofgenomescalestructurebasedclustering
AT backofenrolf inferringnoncodingrnafamiliesandclassesbymeansofgenomescalestructurebasedclustering