Cargando…

CPPred: coding potential prediction based on the global description of RNA sequence

The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can disting...

Descripción completa

Detalles Bibliográficos
Autores principales: Tong, Xiaoxue, Liu, Shiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486542/
https://www.ncbi.nlm.nih.gov/pubmed/30753596
http://dx.doi.org/10.1093/nar/gkz087
_version_ 1783414357078573056
author Tong, Xiaoxue
Liu, Shiyong
author_facet Tong, Xiaoxue
Liu, Shiyong
author_sort Tong, Xiaoxue
collection PubMed
description The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential.
format Online
Article
Text
id pubmed-6486542
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64865422019-05-01 CPPred: coding potential prediction based on the global description of RNA sequence Tong, Xiaoxue Liu, Shiyong Nucleic Acids Res Methods Online The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential. Oxford University Press 2019-05-07 2019-02-11 /pmc/articles/PMC6486542/ /pubmed/30753596 http://dx.doi.org/10.1093/nar/gkz087 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Tong, Xiaoxue
Liu, Shiyong
CPPred: coding potential prediction based on the global description of RNA sequence
title CPPred: coding potential prediction based on the global description of RNA sequence
title_full CPPred: coding potential prediction based on the global description of RNA sequence
title_fullStr CPPred: coding potential prediction based on the global description of RNA sequence
title_full_unstemmed CPPred: coding potential prediction based on the global description of RNA sequence
title_short CPPred: coding potential prediction based on the global description of RNA sequence
title_sort cppred: coding potential prediction based on the global description of rna sequence
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6486542/
https://www.ncbi.nlm.nih.gov/pubmed/30753596
http://dx.doi.org/10.1093/nar/gkz087
work_keys_str_mv AT tongxiaoxue cppredcodingpotentialpredictionbasedontheglobaldescriptionofrnasequence
AT liushiyong cppredcodingpotentialpredictionbasedontheglobaldescriptionofrnasequence