Cargando…

RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins

Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functiona...

Descripción completa

Detalles Bibliográficos
Autores principales: Walia, Rasna R., Xue, Li C., Wilkins, Katherine, El-Manzalawy, Yasser, Dobbs, Drena, Honavar, Vasant
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4028231/
https://www.ncbi.nlm.nih.gov/pubmed/24846307
http://dx.doi.org/10.1371/journal.pone.0097725
_version_ 1782317048122048512
author Walia, Rasna R.
Xue, Li C.
Wilkins, Katherine
El-Manzalawy, Yasser
Dobbs, Drena
Honavar, Vasant
author_facet Walia, Rasna R.
Xue, Li C.
Wilkins, Katherine
El-Manzalawy, Yasser
Dobbs, Drena
Honavar, Vasant
author_sort Walia, Rasna R.
collection PubMed
description Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.
format Online
Article
Text
id pubmed-4028231
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40282312014-05-21 RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins Walia, Rasna R. Xue, Li C. Wilkins, Katherine El-Manzalawy, Yasser Dobbs, Drena Honavar, Vasant PLoS One Research Article Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/. Public Library of Science 2014-05-20 /pmc/articles/PMC4028231/ /pubmed/24846307 http://dx.doi.org/10.1371/journal.pone.0097725 Text en © 2014 Walia et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Walia, Rasna R.
Xue, Li C.
Wilkins, Katherine
El-Manzalawy, Yasser
Dobbs, Drena
Honavar, Vasant
RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins
title RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins
title_full RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins
title_fullStr RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins
title_full_unstemmed RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins
title_short RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins
title_sort rnabindrplus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted rna-binding residues in proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4028231/
https://www.ncbi.nlm.nih.gov/pubmed/24846307
http://dx.doi.org/10.1371/journal.pone.0097725
work_keys_str_mv AT waliarasnar rnabindrplusapredictorthatcombinesmachinelearningandsequencehomologybasedmethodstoimprovethereliabilityofpredictedrnabindingresiduesinproteins
AT xuelic rnabindrplusapredictorthatcombinesmachinelearningandsequencehomologybasedmethodstoimprovethereliabilityofpredictedrnabindingresiduesinproteins
AT wilkinskatherine rnabindrplusapredictorthatcombinesmachinelearningandsequencehomologybasedmethodstoimprovethereliabilityofpredictedrnabindingresiduesinproteins
AT elmanzalawyyasser rnabindrplusapredictorthatcombinesmachinelearningandsequencehomologybasedmethodstoimprovethereliabilityofpredictedrnabindingresiduesinproteins
AT dobbsdrena rnabindrplusapredictorthatcombinesmachinelearningandsequencehomologybasedmethodstoimprovethereliabilityofpredictedrnabindingresiduesinproteins
AT honavarvasant rnabindrplusapredictorthatcombinesmachinelearningandsequencehomologybasedmethodstoimprovethereliabilityofpredictedrnabindingresiduesinproteins