Cargando…

The discriminant power of RNA features for pre-miRNA recognition

BACKGROUND: Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In th...

Descripción completa

Detalles Bibliográficos
Autores principales:	de ON Lopes, Ivani, Schliep, Alexander, de LF de Carvalho, André CP
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046174/ https://www.ncbi.nlm.nih.gov/pubmed/24884650 http://dx.doi.org/10.1186/1471-2105-15-124

_version_	1782480253445210112
author	de ON Lopes, Ivani Schliep, Alexander de LF de Carvalho, André CP
author_facet	de ON Lopes, Ivani Schliep, Alexander de LF de Carvalho, André CP
author_sort	de ON Lopes, Ivani
collection	PubMed
description	BACKGROUND: Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. RESULTS: Small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Inspired in these results, we obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. These estimates are within 0.1% of the maximal values obtained with any feature set (SELECT, Section “Results and discussion”) while it is 34 times faster to compute. Even compared to another feature set (FS2, see Section “Results and discussion”), which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34 times faster to compute. The results obtained by the tools used as references in the experiments carried out showed that five out of these six tools have lower sensitivity or specificity. CONCLUSION: In miRNA discovery the number of putative miRNA loci is in the order of millions. Analysis of putative pre-miRNAs using a computationally expensive feature set would be wasteful or even unfeasible for large genomes. In this work, we propose a relatively inexpensive feature set and explore most of the learning aspects implemented in current ab-initio pre-miRNA prediction tools, which may lead to the development of efficient ab-initio pre-miRNA discovery tools. The material to reproduce the main results from this paper can be downloaded from http://bioinformatics.rutgers.edu/Static/Software/discriminant.tar.gz.
format	Online Article Text
id	pubmed-4046174
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40461742014-06-20 The discriminant power of RNA features for pre-miRNA recognition de ON Lopes, Ivani Schliep, Alexander de LF de Carvalho, André CP BMC Bioinformatics Research Article BACKGROUND: Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. RESULTS: Small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Inspired in these results, we obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. These estimates are within 0.1% of the maximal values obtained with any feature set (SELECT, Section “Results and discussion”) while it is 34 times faster to compute. Even compared to another feature set (FS2, see Section “Results and discussion”), which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34 times faster to compute. The results obtained by the tools used as references in the experiments carried out showed that five out of these six tools have lower sensitivity or specificity. CONCLUSION: In miRNA discovery the number of putative miRNA loci is in the order of millions. Analysis of putative pre-miRNAs using a computationally expensive feature set would be wasteful or even unfeasible for large genomes. In this work, we propose a relatively inexpensive feature set and explore most of the learning aspects implemented in current ab-initio pre-miRNA prediction tools, which may lead to the development of efficient ab-initio pre-miRNA discovery tools. The material to reproduce the main results from this paper can be downloaded from http://bioinformatics.rutgers.edu/Static/Software/discriminant.tar.gz. BioMed Central 2014-05-02 /pmc/articles/PMC4046174/ /pubmed/24884650 http://dx.doi.org/10.1186/1471-2105-15-124 Text en Copyright © 2014 Lopes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article de ON Lopes, Ivani Schliep, Alexander de LF de Carvalho, André CP The discriminant power of RNA features for pre-miRNA recognition
title	The discriminant power of RNA features for pre-miRNA recognition
title_full	The discriminant power of RNA features for pre-miRNA recognition
title_fullStr	The discriminant power of RNA features for pre-miRNA recognition
title_full_unstemmed	The discriminant power of RNA features for pre-miRNA recognition
title_short	The discriminant power of RNA features for pre-miRNA recognition
title_sort	discriminant power of rna features for pre-mirna recognition
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046174/ https://www.ncbi.nlm.nih.gov/pubmed/24884650 http://dx.doi.org/10.1186/1471-2105-15-124
work_keys_str_mv	AT deonlopesivani thediscriminantpowerofrnafeaturesforpremirnarecognition AT schliepalexander thediscriminantpowerofrnafeaturesforpremirnarecognition AT delfdecarvalhoandrecp thediscriminantpowerofrnafeaturesforpremirnarecognition AT deonlopesivani discriminantpowerofrnafeaturesforpremirnarecognition AT schliepalexander discriminantpowerofrnafeaturesforpremirnarecognition AT delfdecarvalhoandrecp discriminantpowerofrnafeaturesforpremirnarecognition

The discriminant power of RNA features for pre-miRNA recognition

Ejemplares similares