Cargando…

NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae

Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for d...

Descripción completa

Detalles Bibliográficos
Autores principales: Nithin, Chandran, Mukherjee, Sunandan, Basak, Jolly, Bahadur, Ranjit Prasad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10095871/
https://www.ncbi.nlm.nih.gov/pubmed/37077974
http://dx.doi.org/10.1017/qpb.2022.18
_version_ 1785024186372063232
author Nithin, Chandran
Mukherjee, Sunandan
Basak, Jolly
Bahadur, Ranjit Prasad
author_facet Nithin, Chandran
Mukherjee, Sunandan
Basak, Jolly
Bahadur, Ranjit Prasad
author_sort Nithin, Chandran
collection PubMed
description Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support vector machines employing radial basis function show the highest accuracy (average F1 of ~96%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR.
format Online
Article
Text
id pubmed-10095871
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-100958712023-04-18 NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae Nithin, Chandran Mukherjee, Sunandan Basak, Jolly Bahadur, Ranjit Prasad Quant Plant Biol Original Research Article Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support vector machines employing radial basis function show the highest accuracy (average F1 of ~96%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR. Cambridge University Press 2022-10-07 /pmc/articles/PMC10095871/ /pubmed/37077974 http://dx.doi.org/10.1017/qpb.2022.18 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research Article
Nithin, Chandran
Mukherjee, Sunandan
Basak, Jolly
Bahadur, Ranjit Prasad
NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_full NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_fullStr NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_full_unstemmed NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_short NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae
title_sort ncodr: a multi-class support vector machine classification to distinguish non-coding rnas in viridiplantae
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10095871/
https://www.ncbi.nlm.nih.gov/pubmed/37077974
http://dx.doi.org/10.1017/qpb.2022.18
work_keys_str_mv AT nithinchandran ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae
AT mukherjeesunandan ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae
AT basakjolly ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae
AT bahadurranjitprasad ncodramulticlasssupportvectormachineclassificationtodistinguishnoncodingrnasinviridiplantae