Cargando…

Prediction and classification of ncRNAs using structural information

BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is there...

Descripción completa

Detalles Bibliográficos
Autores principales: Panwar, Bharat, Arora, Amit, Raghava, Gajendra PS
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3925371/
https://www.ncbi.nlm.nih.gov/pubmed/24521294
http://dx.doi.org/10.1186/1471-2164-15-127
_version_ 1782303851068522496
author Panwar, Bharat
Arora, Amit
Raghava, Gajendra PS
author_facet Panwar, Bharat
Arora, Amit
Raghava, Gajendra PS
author_sort Panwar, Bharat
collection PubMed
description BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts. RESULTS: In this study, we initially develop prediction tools to discriminate coding or non-coding transcripts and thereafter classify ncRNAs into respective classes. In comparison to the existing methods that employed multiple features, our SVM-based method by using a single feature (tri-nucleotide composition), achieved MCC of 0.98. Knowing that the structure of a ncRNA transcript could provide insights into its biological function, we use graph properties of predicted ncRNA structures to classify the transcripts into 18 different non-coding RNA classes. We developed classification models using a variety of algorithms (BayeNet, NaiveBayes, MultilayerPerceptron, IBk, libSVM, SMO and RandomForest) and observed that model based on RandomForest performed better than other models. As compared to the GraPPLE study, the sensitivity (of 13 classes) and specificity (of 14 classes) was higher. Moreover, the overall sensitivity of 0.43 outperforms the sensitivity of GraPPLE (0.33) whereas the overall MCC measure of 0.40 (in contrast to MCC of 0.29 of GraPPLE) was significantly higher for our method. This clearly demonstrates that our models are more accurate than existing models. CONCLUSIONS: This work conclusively demonstrates that a simple feature, tri-nucleotide composition, is sufficient to discriminate between coding and non-coding RNA sequences. Similarly, graph properties based feature set along with RandomForest algorithm are most suitable to classify different ncRNA classes. We have also developed an online and standalone tool-- RNAcon ( http://crdd.osdd.net/raghava/rnacon).
format Online
Article
Text
id pubmed-3925371
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39253712014-03-04 Prediction and classification of ncRNAs using structural information Panwar, Bharat Arora, Amit Raghava, Gajendra PS BMC Genomics Research Article BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts. RESULTS: In this study, we initially develop prediction tools to discriminate coding or non-coding transcripts and thereafter classify ncRNAs into respective classes. In comparison to the existing methods that employed multiple features, our SVM-based method by using a single feature (tri-nucleotide composition), achieved MCC of 0.98. Knowing that the structure of a ncRNA transcript could provide insights into its biological function, we use graph properties of predicted ncRNA structures to classify the transcripts into 18 different non-coding RNA classes. We developed classification models using a variety of algorithms (BayeNet, NaiveBayes, MultilayerPerceptron, IBk, libSVM, SMO and RandomForest) and observed that model based on RandomForest performed better than other models. As compared to the GraPPLE study, the sensitivity (of 13 classes) and specificity (of 14 classes) was higher. Moreover, the overall sensitivity of 0.43 outperforms the sensitivity of GraPPLE (0.33) whereas the overall MCC measure of 0.40 (in contrast to MCC of 0.29 of GraPPLE) was significantly higher for our method. This clearly demonstrates that our models are more accurate than existing models. CONCLUSIONS: This work conclusively demonstrates that a simple feature, tri-nucleotide composition, is sufficient to discriminate between coding and non-coding RNA sequences. Similarly, graph properties based feature set along with RandomForest algorithm are most suitable to classify different ncRNA classes. We have also developed an online and standalone tool-- RNAcon ( http://crdd.osdd.net/raghava/rnacon). BioMed Central 2014-02-13 /pmc/articles/PMC3925371/ /pubmed/24521294 http://dx.doi.org/10.1186/1471-2164-15-127 Text en Copyright © 2014 Panwar et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Panwar, Bharat
Arora, Amit
Raghava, Gajendra PS
Prediction and classification of ncRNAs using structural information
title Prediction and classification of ncRNAs using structural information
title_full Prediction and classification of ncRNAs using structural information
title_fullStr Prediction and classification of ncRNAs using structural information
title_full_unstemmed Prediction and classification of ncRNAs using structural information
title_short Prediction and classification of ncRNAs using structural information
title_sort prediction and classification of ncrnas using structural information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3925371/
https://www.ncbi.nlm.nih.gov/pubmed/24521294
http://dx.doi.org/10.1186/1471-2164-15-127
work_keys_str_mv AT panwarbharat predictionandclassificationofncrnasusingstructuralinformation
AT aroraamit predictionandclassificationofncrnasusingstructuralinformation
AT raghavagajendraps predictionandclassificationofncrnasusingstructuralinformation