Cargando…
PseUI: Pseudouridine sites identification based on RNA sequence information
BACKGROUND: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefi...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114832/ https://www.ncbi.nlm.nih.gov/pubmed/30157750 http://dx.doi.org/10.1186/s12859-018-2321-0 |
_version_ | 1783351267038330880 |
---|---|
author | He, Jingjing Fang, Ting Zhang, Zizheng Huang, Bei Zhu, Xiaolei Xiong, Yi |
author_facet | He, Jingjing Fang, Ting Zhang, Zizheng Huang, Bei Zhu, Xiaolei Xiong, Yi |
author_sort | He, Jingjing |
collection | PubMed |
description | BACKGROUND: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement. RESULTS: In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations. CONCLUSION: In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2321-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6114832 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61148322018-09-04 PseUI: Pseudouridine sites identification based on RNA sequence information He, Jingjing Fang, Ting Zhang, Zizheng Huang, Bei Zhu, Xiaolei Xiong, Yi BMC Bioinformatics Research Article BACKGROUND: Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement. RESULTS: In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations. CONCLUSION: In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2321-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-29 /pmc/articles/PMC6114832/ /pubmed/30157750 http://dx.doi.org/10.1186/s12859-018-2321-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article He, Jingjing Fang, Ting Zhang, Zizheng Huang, Bei Zhu, Xiaolei Xiong, Yi PseUI: Pseudouridine sites identification based on RNA sequence information |
title | PseUI: Pseudouridine sites identification based on RNA sequence information |
title_full | PseUI: Pseudouridine sites identification based on RNA sequence information |
title_fullStr | PseUI: Pseudouridine sites identification based on RNA sequence information |
title_full_unstemmed | PseUI: Pseudouridine sites identification based on RNA sequence information |
title_short | PseUI: Pseudouridine sites identification based on RNA sequence information |
title_sort | pseui: pseudouridine sites identification based on rna sequence information |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114832/ https://www.ncbi.nlm.nih.gov/pubmed/30157750 http://dx.doi.org/10.1186/s12859-018-2321-0 |
work_keys_str_mv | AT hejingjing pseuipseudouridinesitesidentificationbasedonrnasequenceinformation AT fangting pseuipseudouridinesitesidentificationbasedonrnasequenceinformation AT zhangzizheng pseuipseudouridinesitesidentificationbasedonrnasequenceinformation AT huangbei pseuipseudouridinesitesidentificationbasedonrnasequenceinformation AT zhuxiaolei pseuipseudouridinesitesidentificationbasedonrnasequenceinformation AT xiongyi pseuipseudouridinesitesidentificationbasedonrnasequenceinformation |