Cargando…

Transcriptome-Wide Annotation of m(5)C RNA Modifications Using Machine Learning

The emergence of epitranscriptome opened a new chapter in gene regulation. 5-methylcytosine (m(5)C), as an important post-transcriptional modification, has been identified to be involved in a variety of biological processes such as subcellular localization and translational fidelity. Though high-thr...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Jie, Zhai, Jingjing, Bian, Enze, Song, Yujia, Yu, Jiantao, Ma, Chuang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5915569/
https://www.ncbi.nlm.nih.gov/pubmed/29720995
http://dx.doi.org/10.3389/fpls.2018.00519
Descripción
Sumario:The emergence of epitranscriptome opened a new chapter in gene regulation. 5-methylcytosine (m(5)C), as an important post-transcriptional modification, has been identified to be involved in a variety of biological processes such as subcellular localization and translational fidelity. Though high-throughput experimental technologies have been developed and applied to profile m(5)C modifications under certain conditions, transcriptome-wide studies of m(5)C modifications are still hindered by the dynamic nature of m(5)C and the lack of computational prediction methods. In this study, we introduced PEA-m5C, a machine learning-based m(5)C predictor trained with features extracted from the flanking sequence of m(5)C modifications. PEA-m5C yielded an average AUC (area under the receiver operating characteristic) of 0.939 in 10-fold cross-validation experiments based on known Arabidopsis m(5)C modifications. A rigorous independent testing showed that PEA-m5C (Accuracy [Acc] = 0.835, Matthews correlation coefficient [MCC] = 0.688) is remarkably superior to the recently developed m(5)C predictor iRNAm5C-PseDNC (Acc = 0.665, MCC = 0.332). PEA-m5C has been applied to predict candidate m(5)C modifications in annotated Arabidopsis transcripts. Further analysis of these m(5)C candidates showed that 4nt downstream of the translational start site is the most frequently methylated position. PEA-m5C is freely available to academic users at: https://github.com/cma2015/PEA-m5C.