Cargando…
Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensiv...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society of Gene & Cell Therapy
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340967/ https://www.ncbi.nlm.nih.gov/pubmed/32645685 http://dx.doi.org/10.1016/j.omtn.2020.06.004 |
_version_ | 1783555134278598656 |
---|---|
author | Dou, Lijun Li, Xiaoling Ding, Hui Xu, Lei Xiang, Huaikun |
author_facet | Dou, Lijun Li, Xiaoling Ding, Hui Xu, Lei Xiang, Huaikun |
author_sort | Dou, Lijun |
collection | PubMed |
description | 5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant. |
format | Online Article Text |
id | pubmed-7340967 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Society of Gene & Cell Therapy |
record_format | MEDLINE/PubMed |
spelling | pubmed-73409672020-07-14 Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features Dou, Lijun Li, Xiaoling Ding, Hui Xu, Lei Xiang, Huaikun Mol Ther Nucleic Acids Article 5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant. American Society of Gene & Cell Therapy 2020-06-10 /pmc/articles/PMC7340967/ /pubmed/32645685 http://dx.doi.org/10.1016/j.omtn.2020.06.004 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Dou, Lijun Li, Xiaoling Ding, Hui Xu, Lei Xiang, Huaikun Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features |
title | Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features |
title_full | Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features |
title_fullStr | Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features |
title_full_unstemmed | Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features |
title_short | Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features |
title_sort | prediction of m5c modifications in rna sequences by combining multiple sequence features |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7340967/ https://www.ncbi.nlm.nih.gov/pubmed/32645685 http://dx.doi.org/10.1016/j.omtn.2020.06.004 |
work_keys_str_mv | AT doulijun predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures AT lixiaoling predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures AT dinghui predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures AT xulei predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures AT xianghuaikun predictionofm5cmodificationsinrnasequencesbycombiningmultiplesequencefeatures |