Cargando…

iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning

RNA 5-hydroxymethylcytosine (5hmC) modification plays an important role in a series of biological processes. Characterization of its distributions in transcriptome is fundamentally important to reveal the biological functions of 5hmC. Sequencing-based technologies allow the high-throughput identific...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yuan, Chen, Dasheng, Su, Ran, Chen, Wei, Wei, Leyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137033/
https://www.ncbi.nlm.nih.gov/pubmed/32296686
http://dx.doi.org/10.3389/fbioe.2020.00227
_version_ 1783518356105592832
author Liu, Yuan
Chen, Dasheng
Su, Ran
Chen, Wei
Wei, Leyi
author_facet Liu, Yuan
Chen, Dasheng
Su, Ran
Chen, Wei
Wei, Leyi
author_sort Liu, Yuan
collection PubMed
description RNA 5-hydroxymethylcytosine (5hmC) modification plays an important role in a series of biological processes. Characterization of its distributions in transcriptome is fundamentally important to reveal the biological functions of 5hmC. Sequencing-based technologies allow the high-throughput identification of 5hmC; however, they are labor-intensive, time-consuming, as well as expensive. Thus, there is an urgent need to develop more effective and efficient computational methods, at least complementary to the high-throughput technologies. In this study, we developed iRNA5hmC, a computational predictive protocol to identify RNA 5hmC sites using machine learning. In this predictor, we introduced a sequence-based feature algorithm consisting of two feature representations, (1) k-mer spectrum and (2) positional nucleotide binary vector, to capture the sequential characteristics of 5hmC sites. Afterward, we utilized a two-stage feature space optimization strategy to improve the feature representation ability, and trained a predictive model using support vector machine (SVM). Our feature analysis results showed that feature optimization can help to capture the most discriminative features. As compared to well-known existing feature descriptors, our proposed representations can more accurately separate true 5hmC from non-5hmC sites. To the best of our knowledge, iRNA5hmC is the first RNA 5hmC predictor that enables to make predictions based on RNA primary sequences only, without any need of prior experimental knowledge. Importantly, we have established an easy-to-use webserver which is currently available at http://server.malab.cn/iRNA5hmC. We expect it has potential to be a useful tool for the prediction of 5hmC sites.
format Online
Article
Text
id pubmed-7137033
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-71370332020-04-15 iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning Liu, Yuan Chen, Dasheng Su, Ran Chen, Wei Wei, Leyi Front Bioeng Biotechnol Bioengineering and Biotechnology RNA 5-hydroxymethylcytosine (5hmC) modification plays an important role in a series of biological processes. Characterization of its distributions in transcriptome is fundamentally important to reveal the biological functions of 5hmC. Sequencing-based technologies allow the high-throughput identification of 5hmC; however, they are labor-intensive, time-consuming, as well as expensive. Thus, there is an urgent need to develop more effective and efficient computational methods, at least complementary to the high-throughput technologies. In this study, we developed iRNA5hmC, a computational predictive protocol to identify RNA 5hmC sites using machine learning. In this predictor, we introduced a sequence-based feature algorithm consisting of two feature representations, (1) k-mer spectrum and (2) positional nucleotide binary vector, to capture the sequential characteristics of 5hmC sites. Afterward, we utilized a two-stage feature space optimization strategy to improve the feature representation ability, and trained a predictive model using support vector machine (SVM). Our feature analysis results showed that feature optimization can help to capture the most discriminative features. As compared to well-known existing feature descriptors, our proposed representations can more accurately separate true 5hmC from non-5hmC sites. To the best of our knowledge, iRNA5hmC is the first RNA 5hmC predictor that enables to make predictions based on RNA primary sequences only, without any need of prior experimental knowledge. Importantly, we have established an easy-to-use webserver which is currently available at http://server.malab.cn/iRNA5hmC. We expect it has potential to be a useful tool for the prediction of 5hmC sites. Frontiers Media S.A. 2020-03-31 /pmc/articles/PMC7137033/ /pubmed/32296686 http://dx.doi.org/10.3389/fbioe.2020.00227 Text en Copyright © 2020 Liu, Chen, Su, Chen and Wei. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Liu, Yuan
Chen, Dasheng
Su, Ran
Chen, Wei
Wei, Leyi
iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning
title iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning
title_full iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning
title_fullStr iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning
title_full_unstemmed iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning
title_short iRNA5hmC: The First Predictor to Identify RNA 5-Hydroxymethylcytosine Modifications Using Machine Learning
title_sort irna5hmc: the first predictor to identify rna 5-hydroxymethylcytosine modifications using machine learning
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137033/
https://www.ncbi.nlm.nih.gov/pubmed/32296686
http://dx.doi.org/10.3389/fbioe.2020.00227
work_keys_str_mv AT liuyuan irna5hmcthefirstpredictortoidentifyrna5hydroxymethylcytosinemodificationsusingmachinelearning
AT chendasheng irna5hmcthefirstpredictortoidentifyrna5hydroxymethylcytosinemodificationsusingmachinelearning
AT suran irna5hmcthefirstpredictortoidentifyrna5hydroxymethylcytosinemodificationsusingmachinelearning
AT chenwei irna5hmcthefirstpredictortoidentifyrna5hydroxymethylcytosinemodificationsusingmachinelearning
AT weileyi irna5hmcthefirstpredictortoidentifyrna5hydroxymethylcytosinemodificationsusingmachinelearning