Cargando…

FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues

A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challen...

Descripción completa

Detalles Bibliográficos
Autores principales:	EL-Manzalawy, Yasser, Abbas, Mostafa, Malluhi, Qutaibah, Honavar, Vasant
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934694/ https://www.ncbi.nlm.nih.gov/pubmed/27383535 http://dx.doi.org/10.1371/journal.pone.0158445

_version_	1782441380431265792
author	EL-Manzalawy, Yasser Abbas, Mostafa Malluhi, Qutaibah Honavar, Vasant
author_facet	EL-Manzalawy, Yasser Abbas, Mostafa Malluhi, Qutaibah Honavar, Vasant
author_sort	EL-Manzalawy, Yasser
collection	PubMed
description	A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
format	Online Article Text
id	pubmed-4934694
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-49346942016-07-18 FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues EL-Manzalawy, Yasser Abbas, Mostafa Malluhi, Qutaibah Honavar, Vasant PLoS One Research Article A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces. Public Library of Science 2016-07-06 /pmc/articles/PMC4934694/ /pubmed/27383535 http://dx.doi.org/10.1371/journal.pone.0158445 Text en © 2016 EL-Manzalawy et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article EL-Manzalawy, Yasser Abbas, Mostafa Malluhi, Qutaibah Honavar, Vasant FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
title	FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
title_full	FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
title_fullStr	FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
title_full_unstemmed	FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
title_short	FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
title_sort	fastrnabindr: fast and accurate prediction of protein-rna interface residues
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934694/ https://www.ncbi.nlm.nih.gov/pubmed/27383535 http://dx.doi.org/10.1371/journal.pone.0158445
work_keys_str_mv	AT elmanzalawyyasser fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues AT abbasmostafa fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues AT malluhiqutaibah fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues AT honavarvasant fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues

FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues

Ejemplares similares