Cargando…

Predicting RNA-Protein Interactions Using Only Sequence Information

BACKGROUND: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginn...

Descripción completa

Detalles Bibliográficos
Autores principales: Muppirala, Usha K, Honavar, Vasant G, Dobbs, Drena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322362/
https://www.ncbi.nlm.nih.gov/pubmed/22192482
http://dx.doi.org/10.1186/1471-2105-12-489
_version_ 1782229058757591040
author Muppirala, Usha K
Honavar, Vasant G
Dobbs, Drena
author_facet Muppirala, Usha K
Honavar, Vasant G
Dobbs, Drena
author_sort Muppirala, Usha K
collection PubMed
description BACKGROUND: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. RESULTS: We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. CONCLUSIONS: Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.
format Online
Article
Text
id pubmed-3322362
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33223622012-04-18 Predicting RNA-Protein Interactions Using Only Sequence Information Muppirala, Usha K Honavar, Vasant G Dobbs, Drena BMC Bioinformatics Research Article BACKGROUND: RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. RESULTS: We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. CONCLUSIONS: Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/. BioMed Central 2011-12-22 /pmc/articles/PMC3322362/ /pubmed/22192482 http://dx.doi.org/10.1186/1471-2105-12-489 Text en Copyright ©2011 Muppirala et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Muppirala, Usha K
Honavar, Vasant G
Dobbs, Drena
Predicting RNA-Protein Interactions Using Only Sequence Information
title Predicting RNA-Protein Interactions Using Only Sequence Information
title_full Predicting RNA-Protein Interactions Using Only Sequence Information
title_fullStr Predicting RNA-Protein Interactions Using Only Sequence Information
title_full_unstemmed Predicting RNA-Protein Interactions Using Only Sequence Information
title_short Predicting RNA-Protein Interactions Using Only Sequence Information
title_sort predicting rna-protein interactions using only sequence information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322362/
https://www.ncbi.nlm.nih.gov/pubmed/22192482
http://dx.doi.org/10.1186/1471-2105-12-489
work_keys_str_mv AT muppiralaushak predictingrnaproteininteractionsusingonlysequenceinformation
AT honavarvasantg predictingrnaproteininteractionsusingonlysequenceinformation
AT dobbsdrena predictingrnaproteininteractionsusingonlysequenceinformation