Cargando…

RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites

One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Seve...

Descripción completa

Detalles Bibliográficos
Autores principales: Lv, Zhibin, Zhang, Jun, Ding, Hui, Zou, Quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054385/
https://www.ncbi.nlm.nih.gov/pubmed/32175316
http://dx.doi.org/10.3389/fbioe.2020.00134
_version_ 1783503186492915712
author Lv, Zhibin
Zhang, Jun
Ding, Hui
Zou, Quan
author_facet Lv, Zhibin
Zhang, Jun
Ding, Hui
Zou, Quan
author_sort Lv, Zhibin
collection PubMed
description One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
format Online
Article
Text
id pubmed-7054385
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-70543852020-03-13 RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites Lv, Zhibin Zhang, Jun Ding, Hui Zou, Quan Front Bioeng Biotechnol Bioengineering and Biotechnology One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu. Frontiers Media S.A. 2020-02-26 /pmc/articles/PMC7054385/ /pubmed/32175316 http://dx.doi.org/10.3389/fbioe.2020.00134 Text en Copyright © 2020 Lv, Zhang, Ding and Zou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Lv, Zhibin
Zhang, Jun
Ding, Hui
Zou, Quan
RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
title RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
title_full RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
title_fullStr RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
title_full_unstemmed RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
title_short RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
title_sort rf-pseu: a random forest predictor for rna pseudouridine sites
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054385/
https://www.ncbi.nlm.nih.gov/pubmed/32175316
http://dx.doi.org/10.3389/fbioe.2020.00134
work_keys_str_mv AT lvzhibin rfpseuarandomforestpredictorforrnapseudouridinesites
AT zhangjun rfpseuarandomforestpredictorforrnapseudouridinesites
AT dinghui rfpseuarandomforestpredictorforrnapseudouridinesites
AT zouquan rfpseuarandomforestpredictorforrnapseudouridinesites