Cargando…

RNAscClust: clustering RNA sequences using structure conservation and graph based motifs

MOTIVATION: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural simi...

Descripción completa

Detalles Bibliográficos
Autores principales: Miladi, Milad, Junge, Alexander, Costa, Fabrizio, Seemann, Stefan E, Havgaard, Jakob Hull, Gorodkin, Jan, Backofen, Rolf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870858/
https://www.ncbi.nlm.nih.gov/pubmed/28334186
http://dx.doi.org/10.1093/bioinformatics/btx114
_version_ 1783309558286909440
author Miladi, Milad
Junge, Alexander
Costa, Fabrizio
Seemann, Stefan E
Havgaard, Jakob Hull
Gorodkin, Jan
Backofen, Rolf
author_facet Miladi, Milad
Junge, Alexander
Costa, Fabrizio
Seemann, Stefan E
Havgaard, Jakob Hull
Gorodkin, Jan
Backofen, Rolf
author_sort Miladi, Milad
collection PubMed
description MOTIVATION: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. RESULTS: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. AVAILABILITY AND IMPLEMENTATION: RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870858
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58708582018-03-29 RNAscClust: clustering RNA sequences using structure conservation and graph based motifs Miladi, Milad Junge, Alexander Costa, Fabrizio Seemann, Stefan E Havgaard, Jakob Hull Gorodkin, Jan Backofen, Rolf Bioinformatics Original Papers MOTIVATION: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. RESULTS: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. AVAILABILITY AND IMPLEMENTATION: RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-07-15 2017-02-27 /pmc/articles/PMC5870858/ /pubmed/28334186 http://dx.doi.org/10.1093/bioinformatics/btx114 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Miladi, Milad
Junge, Alexander
Costa, Fabrizio
Seemann, Stefan E
Havgaard, Jakob Hull
Gorodkin, Jan
Backofen, Rolf
RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
title RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
title_full RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
title_fullStr RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
title_full_unstemmed RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
title_short RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
title_sort rnascclust: clustering rna sequences using structure conservation and graph based motifs
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870858/
https://www.ncbi.nlm.nih.gov/pubmed/28334186
http://dx.doi.org/10.1093/bioinformatics/btx114
work_keys_str_mv AT miladimilad rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs
AT jungealexander rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs
AT costafabrizio rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs
AT seemannstefane rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs
AT havgaardjakobhull rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs
AT gorodkinjan rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs
AT backofenrolf rnascclustclusteringrnasequencesusingstructureconservationandgraphbasedmotifs