Cargando…

Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features

5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Dou, Lijun, Li, Xiaoling, Ding, Hui, Xu, Lei, Xiang, Huaikun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Gene & Cell Therapy 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340967/
https://www.ncbi.nlm.nih.gov/pubmed/32645685
http://dx.doi.org/10.1016/j.omtn.2020.06.004
_version_ 1783555134278598656
author Dou, Lijun
Li, Xiaoling
Ding, Hui
Xu, Lei
Xiang, Huaikun
author_facet Dou, Lijun
Li, Xiaoling
Ding, Hui
Xu, Lei
Xiang, Huaikun
author_sort Dou, Lijun
collection PubMed
description 5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant.
format Online
Article
Text
id pubmed-7340967
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society of Gene & Cell Therapy
record_format MEDLINE/PubMed
spelling pubmed-73409672020-07-14 Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features Dou, Lijun Li, Xiaoling Ding, Hui Xu, Lei Xiang, Huaikun Mol Ther Nucleic Acids Article 5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant. American Society of Gene & Cell Therapy 2020-06-10 /pmc/articles/PMC7340967/ /pubmed/32645685 http://dx.doi.org/10.1016/j.omtn.2020.06.004 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dou, Lijun
Li, Xiaoling
Ding, Hui
Xu, Lei
Xiang, Huaikun
Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
title Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
title_full Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
title_fullStr Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
title_full_unstemmed Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
title_short Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
title_sort prediction of m5c modifications in rna sequences by combining multiple sequence features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340967/
https://www.ncbi.nlm.nih.gov/pubmed/32645685
http://dx.doi.org/10.1016/j.omtn.2020.06.004
work_keys_str_mv AT doulijun predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures
AT lixiaoling predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures
AT dinghui predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures
AT xulei predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures
AT xianghuaikun predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures