Cargando…

MiRmat: Mature microRNA Sequence Prediction

BACKGROUND: MicroRNAs are known to be generated from primary transcripts mainly through the sequential cleavages by two enzymes, Drosha and Dicer. The sequence of a mature microRNA, especially the ‘seeding sequence’, largely determines its binding ability and specificity to target mRNAs. Therefore,...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Chenfeng, Li, Ying-Xin, Zhang, Guangxin, Gu, Zuguang, Yang, Rong, Li, Jie, Lu, Zhi John, Zhou, Zhi-Hua, Zhang, Chenyu, Wang, Jin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531441/
https://www.ncbi.nlm.nih.gov/pubmed/23300555
http://dx.doi.org/10.1371/journal.pone.0051673
Descripción
Sumario:BACKGROUND: MicroRNAs are known to be generated from primary transcripts mainly through the sequential cleavages by two enzymes, Drosha and Dicer. The sequence of a mature microRNA, especially the ‘seeding sequence’, largely determines its binding ability and specificity to target mRNAs. Therefore, methods that predict mature microRNA sequences with high accuracy will benefit the identification and characterization of novel microRNAs and their targets, and contribute to inferring the post-transcriptional regulation network at a genome scale. METHODOLOGY/PRINCIPAL FINDINGS: We have developed a method, MiRmat, to predict the mature microRNA sequence. MiRmat is essentially composed of two parts: the prediction of Drosha processing site and the identification of Dicer processing site. Based on the analysis of microRNAs from 12 species, we found that the patterns of free energy profiles are conserved among vertebrate microRNA hairpins. Therefore, we introduced in our method the free energy distribution pattern of the downstream part of pri-microRNA secondary structure and Random Forest algorithm to predict the mature microRNA sequence. Based on the evaluation on an independent test dataset from 10 vertebrates, MiRmat was shown to identify 77.8% of the Drosha processing sites and 92.8% of the Dicer sites within a deviation of 2 nt. In a more stringent evaluation by excluding the microRNAs sharing the same family between the training set and test set, MiRmat kept a rather well performance of 71.9% and 87.2% of the identification rate on the Drosha and Dicer site respectively, which represents the ability to deal with the novel microRNA family. MiRmat outperforms other state-of-the-art methods and has a high degree of efficacy for the prediction of mature microRNA sequences of vertebrates. CONCLUSION: MiRmat was developed for identifying microRNA mature sequence(s) by introducing the free energy distribution of RNA stem-loop structure and the Random Forest algorithm. We prove that MiRmat has better performance than the existing tools and is applicable among vertebrates. MiRmat is freely available at http://mcube.nju.edu.cn/jwang/lab/soft/MiRmat/.