Cargando…

Prediction and classification of ncRNAs using structural information

BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is there...

Descripción completa

Detalles Bibliográficos
Autores principales:	Panwar, Bharat, Arora, Amit, Raghava, Gajendra PS
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3925371/ https://www.ncbi.nlm.nih.gov/pubmed/24521294 http://dx.doi.org/10.1186/1471-2164-15-127

_version_	1782303851068522496
author	Panwar, Bharat Arora, Amit Raghava, Gajendra PS
author_facet	Panwar, Bharat Arora, Amit Raghava, Gajendra PS
author_sort	Panwar, Bharat
collection	PubMed
description	BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts. RESULTS: In this study, we initially develop prediction tools to discriminate coding or non-coding transcripts and thereafter classify ncRNAs into respective classes. In comparison to the existing methods that employed multiple features, our SVM-based method by using a single feature (tri-nucleotide composition), achieved MCC of 0.98. Knowing that the structure of a ncRNA transcript could provide insights into its biological function, we use graph properties of predicted ncRNA structures to classify the transcripts into 18 different non-coding RNA classes. We developed classification models using a variety of algorithms (BayeNet, NaiveBayes, MultilayerPerceptron, IBk, libSVM, SMO and RandomForest) and observed that model based on RandomForest performed better than other models. As compared to the GraPPLE study, the sensitivity (of 13 classes) and specificity (of 14 classes) was higher. Moreover, the overall sensitivity of 0.43 outperforms the sensitivity of GraPPLE (0.33) whereas the overall MCC measure of 0.40 (in contrast to MCC of 0.29 of GraPPLE) was significantly higher for our method. This clearly demonstrates that our models are more accurate than existing models. CONCLUSIONS: This work conclusively demonstrates that a simple feature, tri-nucleotide composition, is sufficient to discriminate between coding and non-coding RNA sequences. Similarly, graph properties based feature set along with RandomForest algorithm are most suitable to classify different ncRNA classes. We have also developed an online and standalone tool-- RNAcon ( http://crdd.osdd.net/raghava/rnacon).
format	Online Article Text
id	pubmed-3925371
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39253712014-03-04 Prediction and classification of ncRNAs using structural information Panwar, Bharat Arora, Amit Raghava, Gajendra PS BMC Genomics Research Article BACKGROUND: Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts. RESULTS: In this study, we initially develop prediction tools to discriminate coding or non-coding transcripts and thereafter classify ncRNAs into respective classes. In comparison to the existing methods that employed multiple features, our SVM-based method by using a single feature (tri-nucleotide composition), achieved MCC of 0.98. Knowing that the structure of a ncRNA transcript could provide insights into its biological function, we use graph properties of predicted ncRNA structures to classify the transcripts into 18 different non-coding RNA classes. We developed classification models using a variety of algorithms (BayeNet, NaiveBayes, MultilayerPerceptron, IBk, libSVM, SMO and RandomForest) and observed that model based on RandomForest performed better than other models. As compared to the GraPPLE study, the sensitivity (of 13 classes) and specificity (of 14 classes) was higher. Moreover, the overall sensitivity of 0.43 outperforms the sensitivity of GraPPLE (0.33) whereas the overall MCC measure of 0.40 (in contrast to MCC of 0.29 of GraPPLE) was significantly higher for our method. This clearly demonstrates that our models are more accurate than existing models. CONCLUSIONS: This work conclusively demonstrates that a simple feature, tri-nucleotide composition, is sufficient to discriminate between coding and non-coding RNA sequences. Similarly, graph properties based feature set along with RandomForest algorithm are most suitable to classify different ncRNA classes. We have also developed an online and standalone tool-- RNAcon ( http://crdd.osdd.net/raghava/rnacon). BioMed Central 2014-02-13 /pmc/articles/PMC3925371/ /pubmed/24521294 http://dx.doi.org/10.1186/1471-2164-15-127 Text en Copyright © 2014 Panwar et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Panwar, Bharat Arora, Amit Raghava, Gajendra PS Prediction and classification of ncRNAs using structural information
title	Prediction and classification of ncRNAs using structural information
title_full	Prediction and classification of ncRNAs using structural information
title_fullStr	Prediction and classification of ncRNAs using structural information
title_full_unstemmed	Prediction and classification of ncRNAs using structural information
title_short	Prediction and classification of ncRNAs using structural information
title_sort	prediction and classification of ncrnas using structural information
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3925371/ https://www.ncbi.nlm.nih.gov/pubmed/24521294 http://dx.doi.org/10.1186/1471-2164-15-127
work_keys_str_mv	AT panwarbharat predictionandclassificationofncrnasusingstructuralinformation AT aroraamit predictionandclassificationofncrnasusingstructuralinformation AT raghavagajendraps predictionandclassificationofncrnasusingstructuralinformation

Prediction and classification of ncRNAs using structural information

Ejemplares similares