Cargando…

Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties

RATIONALE: PIWI-interacting RNAs (piRNAs) are a recently-discovered class of small non-coding RNAs (ncRNAs) with a length of 21-35 nucleotides. They play a role in gene expression regulation, transposon silencing, and viral infection inhibition. Once considered as “dark matter” of ncRNAs, piRNAs eme...

Descripción completa

Detalles Bibliográficos
Autores principales: Monga, Isha, Banerjee, Indranil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Bentham Science Publishers 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327968/
https://www.ncbi.nlm.nih.gov/pubmed/32655289
http://dx.doi.org/10.2174/1389202920666191129112705
_version_ 1783552662158966784
author Monga, Isha
Banerjee, Indranil
author_facet Monga, Isha
Banerjee, Indranil
author_sort Monga, Isha
collection PubMed
description RATIONALE: PIWI-interacting RNAs (piRNAs) are a recently-discovered class of small non-coding RNAs (ncRNAs) with a length of 21-35 nucleotides. They play a role in gene expression regulation, transposon silencing, and viral infection inhibition. Once considered as “dark matter” of ncRNAs, piRNAs emerged as important players in multiple cellular functions in different organisms. However, our knowledge of piRNAs is still very limited as many piRNAs have not been yet identified due to lack of robust computational predictive tools. METHODS: To identify novel piRNAs, we developed piRNAPred, an integrated framework for piRNA prediction employing hybrid features like k-mer nucleotide composition, secondary structure, thermodynamic and physicochemical properties. A non-redundant dataset (D(3349) or D(1684p+1665n)) comprising 1684 experimentally verified piRNAs and 1665 non-piRNA sequences was obtained from piRBase and NONCODE, respectively. These sequences were subjected to the computation of various sequence-structure based features in binary format and trained using different machine learning techniques, of which support vector machine (SVM) performed the best. RESULTS: During the ten-fold cross-validation approach (10-CV), piRNAPred achieved an overall accuracy of 98.60% with Mathews correlation coefficient (MCC) of 0.97 and receiver operating characteristic (ROC) of 0.99. Furthermore, we achieved a dimensionality reduction of feature space using an attribute selected classifier. CONCLUSION: We obtained the highest performance in accurately predicting piRNAs as compared to the current state-of-the-art piRNA predictors. In conclusion, piRNAPred would be helpful to expand the piRNA repertoire, and provide new insights on piRNA functions.
format Online
Article
Text
id pubmed-7327968
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Bentham Science Publishers
record_format MEDLINE/PubMed
spelling pubmed-73279682020-07-09 Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties Monga, Isha Banerjee, Indranil Curr Genomics Article RATIONALE: PIWI-interacting RNAs (piRNAs) are a recently-discovered class of small non-coding RNAs (ncRNAs) with a length of 21-35 nucleotides. They play a role in gene expression regulation, transposon silencing, and viral infection inhibition. Once considered as “dark matter” of ncRNAs, piRNAs emerged as important players in multiple cellular functions in different organisms. However, our knowledge of piRNAs is still very limited as many piRNAs have not been yet identified due to lack of robust computational predictive tools. METHODS: To identify novel piRNAs, we developed piRNAPred, an integrated framework for piRNA prediction employing hybrid features like k-mer nucleotide composition, secondary structure, thermodynamic and physicochemical properties. A non-redundant dataset (D(3349) or D(1684p+1665n)) comprising 1684 experimentally verified piRNAs and 1665 non-piRNA sequences was obtained from piRBase and NONCODE, respectively. These sequences were subjected to the computation of various sequence-structure based features in binary format and trained using different machine learning techniques, of which support vector machine (SVM) performed the best. RESULTS: During the ten-fold cross-validation approach (10-CV), piRNAPred achieved an overall accuracy of 98.60% with Mathews correlation coefficient (MCC) of 0.97 and receiver operating characteristic (ROC) of 0.99. Furthermore, we achieved a dimensionality reduction of feature space using an attribute selected classifier. CONCLUSION: We obtained the highest performance in accurately predicting piRNAs as compared to the current state-of-the-art piRNA predictors. In conclusion, piRNAPred would be helpful to expand the piRNA repertoire, and provide new insights on piRNA functions. Bentham Science Publishers 2019-11 2019-11 /pmc/articles/PMC7327968/ /pubmed/32655289 http://dx.doi.org/10.2174/1389202920666191129112705 Text en © 2019 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/legalcode This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
spellingShingle Article
Monga, Isha
Banerjee, Indranil
Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties
title Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties
title_full Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties
title_fullStr Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties
title_full_unstemmed Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties
title_short Computational Identification of piRNAs Using Features Based on RNA Sequence, Structure, Thermodynamic and Physicochemical Properties
title_sort computational identification of pirnas using features based on rna sequence, structure, thermodynamic and physicochemical properties
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327968/
https://www.ncbi.nlm.nih.gov/pubmed/32655289
http://dx.doi.org/10.2174/1389202920666191129112705
work_keys_str_mv AT mongaisha computationalidentificationofpirnasusingfeaturesbasedonrnasequencestructurethermodynamicandphysicochemicalproperties
AT banerjeeindranil computationalidentificationofpirnasusingfeaturesbasedonrnasequencestructurethermodynamicandphysicochemicalproperties