Cargando…
RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Seve...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054385/ https://www.ncbi.nlm.nih.gov/pubmed/32175316 http://dx.doi.org/10.3389/fbioe.2020.00134 |
_version_ | 1783503186492915712 |
---|---|
author | Lv, Zhibin Zhang, Jun Ding, Hui Zou, Quan |
author_facet | Lv, Zhibin Zhang, Jun Ding, Hui Zou, Quan |
author_sort | Lv, Zhibin |
collection | PubMed |
description | One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu. |
format | Online Article Text |
id | pubmed-7054385 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-70543852020-03-13 RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites Lv, Zhibin Zhang, Jun Ding, Hui Zou, Quan Front Bioeng Biotechnol Bioengineering and Biotechnology One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu. Frontiers Media S.A. 2020-02-26 /pmc/articles/PMC7054385/ /pubmed/32175316 http://dx.doi.org/10.3389/fbioe.2020.00134 Text en Copyright © 2020 Lv, Zhang, Ding and Zou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioengineering and Biotechnology Lv, Zhibin Zhang, Jun Ding, Hui Zou, Quan RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites |
title | RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites |
title_full | RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites |
title_fullStr | RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites |
title_full_unstemmed | RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites |
title_short | RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites |
title_sort | rf-pseu: a random forest predictor for rna pseudouridine sites |
topic | Bioengineering and Biotechnology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054385/ https://www.ncbi.nlm.nih.gov/pubmed/32175316 http://dx.doi.org/10.3389/fbioe.2020.00134 |
work_keys_str_mv | AT lvzhibin rfpseuarandomforestpredictorforrnapseudouridinesites AT zhangjun rfpseuarandomforestpredictorforrnapseudouridinesites AT dinghui rfpseuarandomforestpredictorforrnapseudouridinesites AT zouquan rfpseuarandomforestpredictorforrnapseudouridinesites |