Cargando…

A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences

RNA-protein interactions play vital roles in driving the cellular machineries. Despite significant involvement in several biological processes, the underlying molecular mechanism of RNA-protein interactions is still elusive. This may be due to the experimental difficulties in solving co-crystallized...

Descripción completa

Detalles Bibliográficos
Autores principales: Agarwal, Ankita, Singh, Kunal, Kant, Shri, Bahadur, Ranjit Prasad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9249596/
https://www.ncbi.nlm.nih.gov/pubmed/35832617
http://dx.doi.org/10.1016/j.csbj.2022.06.036
_version_ 1784739618091958272
author Agarwal, Ankita
Singh, Kunal
Kant, Shri
Bahadur, Ranjit Prasad
author_facet Agarwal, Ankita
Singh, Kunal
Kant, Shri
Bahadur, Ranjit Prasad
author_sort Agarwal, Ankita
collection PubMed
description RNA-protein interactions play vital roles in driving the cellular machineries. Despite significant involvement in several biological processes, the underlying molecular mechanism of RNA-protein interactions is still elusive. This may be due to the experimental difficulties in solving co-crystallized RNA-protein complexes. Inherent flexibility of RNA molecules to adopt different conformations makes them functionally diverse. Their interactions with protein have implications in RNA disease biology. Thus, study of binding interfaces can provide a mechanistic insight of the molecular functioning and aberrations caused due to altered interactions. Moreover, high-throughput sequencing technologies have generated huge sequence data compared to available structural data of RNA-protein complexes. In such a scenario, efficient computational algorithms are required for identification of protein-binding interfaces of RNA in the absence of known structures. We have investigated several machine learning classifiers and various features derived from nucleotide sequences to identify protein-binding nucleotides in RNA. We achieve best performance with nucleotide-triplet and nucleotide-quartet feature-based random forest models. An overall accuracy of 84.8%, sensitivity of 83.2%, specificity of 86.1%, MCC of 0.70 and AUC of 0.93 is achieved. We have further implemented the developed models in a user-friendly webserver “Nucpred”, which is freely accessible at “http://www.csb.iitkgp.ac.in/applications/Nucpred/index”.
format Online
Article
Text
id pubmed-9249596
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-92495962022-07-12 A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences Agarwal, Ankita Singh, Kunal Kant, Shri Bahadur, Ranjit Prasad Comput Struct Biotechnol J Research Article RNA-protein interactions play vital roles in driving the cellular machineries. Despite significant involvement in several biological processes, the underlying molecular mechanism of RNA-protein interactions is still elusive. This may be due to the experimental difficulties in solving co-crystallized RNA-protein complexes. Inherent flexibility of RNA molecules to adopt different conformations makes them functionally diverse. Their interactions with protein have implications in RNA disease biology. Thus, study of binding interfaces can provide a mechanistic insight of the molecular functioning and aberrations caused due to altered interactions. Moreover, high-throughput sequencing technologies have generated huge sequence data compared to available structural data of RNA-protein complexes. In such a scenario, efficient computational algorithms are required for identification of protein-binding interfaces of RNA in the absence of known structures. We have investigated several machine learning classifiers and various features derived from nucleotide sequences to identify protein-binding nucleotides in RNA. We achieve best performance with nucleotide-triplet and nucleotide-quartet feature-based random forest models. An overall accuracy of 84.8%, sensitivity of 83.2%, specificity of 86.1%, MCC of 0.70 and AUC of 0.93 is achieved. We have further implemented the developed models in a user-friendly webserver “Nucpred”, which is freely accessible at “http://www.csb.iitkgp.ac.in/applications/Nucpred/index”. Research Network of Computational and Structural Biotechnology 2022-06-17 /pmc/articles/PMC9249596/ /pubmed/35832617 http://dx.doi.org/10.1016/j.csbj.2022.06.036 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Agarwal, Ankita
Singh, Kunal
Kant, Shri
Bahadur, Ranjit Prasad
A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences
title A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences
title_full A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences
title_fullStr A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences
title_full_unstemmed A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences
title_short A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences
title_sort comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in rna sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9249596/
https://www.ncbi.nlm.nih.gov/pubmed/35832617
http://dx.doi.org/10.1016/j.csbj.2022.06.036
work_keys_str_mv AT agarwalankita acomparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT singhkunal acomparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT kantshri acomparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT bahadurranjitprasad acomparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT agarwalankita comparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT singhkunal comparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT kantshri comparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences
AT bahadurranjitprasad comparativeanalysisofmachinelearningclassifiersforpredictingproteinbindingnucleotidesinrnasequences