Cargando…

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

BACKGROUND: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saito, Yutaka, Sato, Kengo, Sakakibara, Yasubumi
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044305/ https://www.ncbi.nlm.nih.gov/pubmed/21342580 http://dx.doi.org/10.1186/1471-2105-12-S1-S48

_version_	1782198716293185536
author	Saito, Yutaka Sato, Kengo Sakakibara, Yasubumi
author_facet	Saito, Yutaka Sato, Kengo Sakakibara, Yasubumi
author_sort	Saito, Yutaka
collection	PubMed
description	BACKGROUND: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level. RESULTS: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%. CONCLUSIONS: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.
format	Text
id	pubmed-3044305
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30443052011-02-25 Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures Saito, Yutaka Sato, Kengo Sakakibara, Yasubumi BMC Bioinformatics Research BACKGROUND: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level. RESULTS: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%. CONCLUSIONS: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/. BioMed Central 2011-02-15 /pmc/articles/PMC3044305/ /pubmed/21342580 http://dx.doi.org/10.1186/1471-2105-12-S1-S48 Text en Copyright ©2011 Saito et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Saito, Yutaka Sato, Kengo Sakakibara, Yasubumi Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
title	Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
title_full	Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
title_fullStr	Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
title_full_unstemmed	Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
title_short	Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
title_sort	fast and accurate clustering of noncoding rnas using ensembles of sequence alignments and secondary structures
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044305/ https://www.ncbi.nlm.nih.gov/pubmed/21342580 http://dx.doi.org/10.1186/1471-2105-12-S1-S48
work_keys_str_mv	AT saitoyutaka fastandaccurateclusteringofnoncodingrnasusingensemblesofsequencealignmentsandsecondarystructures AT satokengo fastandaccurateclusteringofnoncodingrnasusingensemblesofsequencealignmentsandsecondarystructures AT sakakibarayasubumi fastandaccurateclusteringofnoncodingrnasusingensemblesofsequencealignmentsandsecondarystructures

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

Ejemplares similares