Cargando…

PseUI: Pseudouridine sites identification based on RNA sequence information

BACKGROUND: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefi...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Jingjing, Fang, Ting, Zhang, Zizheng, Huang, Bei, Zhu, Xiaolei, Xiong, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114832/
https://www.ncbi.nlm.nih.gov/pubmed/30157750
http://dx.doi.org/10.1186/s12859-018-2321-0
_version_ 1783351267038330880
author He, Jingjing
Fang, Ting
Zhang, Zizheng
Huang, Bei
Zhu, Xiaolei
Xiong, Yi
author_facet He, Jingjing
Fang, Ting
Zhang, Zizheng
Huang, Bei
Zhu, Xiaolei
Xiong, Yi
author_sort He, Jingjing
collection PubMed
description BACKGROUND: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement. RESULTS: In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations. CONCLUSION: In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2321-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6114832
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61148322018-09-04 PseUI: Pseudouridine sites identification based on RNA sequence information He, Jingjing Fang, Ting Zhang, Zizheng Huang, Bei Zhu, Xiaolei Xiong, Yi BMC Bioinformatics Research Article BACKGROUND: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement. RESULTS: In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations. CONCLUSION: In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2321-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-29 /pmc/articles/PMC6114832/ /pubmed/30157750 http://dx.doi.org/10.1186/s12859-018-2321-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
He, Jingjing
Fang, Ting
Zhang, Zizheng
Huang, Bei
Zhu, Xiaolei
Xiong, Yi
PseUI: Pseudouridine sites identification based on RNA sequence information
title PseUI: Pseudouridine sites identification based on RNA sequence information
title_full PseUI: Pseudouridine sites identification based on RNA sequence information
title_fullStr PseUI: Pseudouridine sites identification based on RNA sequence information
title_full_unstemmed PseUI: Pseudouridine sites identification based on RNA sequence information
title_short PseUI: Pseudouridine sites identification based on RNA sequence information
title_sort pseui: pseudouridine sites identification based on rna sequence information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114832/
https://www.ncbi.nlm.nih.gov/pubmed/30157750
http://dx.doi.org/10.1186/s12859-018-2321-0
work_keys_str_mv AT hejingjing pseuipseudouridinesitesidentificationbasedonrnasequenceinformation
AT fangting pseuipseudouridinesitesidentificationbasedonrnasequenceinformation
AT zhangzizheng pseuipseudouridinesitesidentificationbasedonrnasequenceinformation
AT huangbei pseuipseudouridinesitesidentificationbasedonrnasequenceinformation
AT zhuxiaolei pseuipseudouridinesitesidentificationbasedonrnasequenceinformation
AT xiongyi pseuipseudouridinesitesidentificationbasedonrnasequenceinformation