Cargando…

Predicting RNA-binding sites of proteins using support vector machines and evolutionary information

BACKGROUND: RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimen...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Cheng-Wei, Su, Emily Chia-Yu, Hwang, Jenn-Kang, Sung, Ting-Yi, Hsu, Wen-Lian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638146/
https://www.ncbi.nlm.nih.gov/pubmed/19091029
http://dx.doi.org/10.1186/1471-2105-9-S12-S6
_version_ 1782164395348983808
author Cheng, Cheng-Wei
Su, Emily Chia-Yu
Hwang, Jenn-Kang
Sung, Ting-Yi
Hsu, Wen-Lian
author_facet Cheng, Cheng-Wei
Su, Emily Chia-Yu
Hwang, Jenn-Kang
Sung, Ting-Yi
Hsu, Wen-Lian
author_sort Cheng, Cheng-Wei
collection PubMed
description BACKGROUND: RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and labor-intensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities. RESULTS: We propose a method, RNAProB, which incorporates a new smoothed position-specific scoring matrix (PSSM) encoding scheme with a support vector machine model to predict RNA-binding sites in proteins. Besides the incorporation of evolutionary information from standard PSSM profiles, the proposed smoothed PSSM encoding scheme also considers the correlation and dependency from the neighboring residues for each amino acid in a protein. Experimental results show that smoothed PSSM encoding significantly enhances the prediction performance, especially for sensitivity. Using five-fold cross-validation, our method performs better than the state-of-the-art systems by 4.90%~6.83%, 0.88%~5.33%, and 0.10~0.23 in terms of overall accuracy, specificity, and Matthew's correlation coefficient, respectively. Most notably, compared to other approaches, RNAProB significantly improves sensitivity by 7.0%~26.9% over the benchmark data sets. To prevent data over fitting, a three-way data split procedure is incorporated to estimate the prediction performance. Moreover, physicochemical properties and amino acid preferences of RNA-binding proteins are examined and analyzed. CONCLUSION: Our results demonstrate that smoothed PSSM encoding scheme significantly enhances the performance of RNA-binding site prediction in proteins. This also supports our assumption that smoothed PSSM encoding can better resolve the ambiguity of discriminating between interacting and non-interacting residues by modelling the dependency from surrounding residues. The proposed method can be used in other research areas, such as DNA-binding site prediction, protein-protein interaction, and prediction of posttranslational modification sites.
format Text
id pubmed-2638146
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26381462009-02-24 Predicting RNA-binding sites of proteins using support vector machines and evolutionary information Cheng, Cheng-Wei Su, Emily Chia-Yu Hwang, Jenn-Kang Sung, Ting-Yi Hsu, Wen-Lian BMC Bioinformatics Research BACKGROUND: RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and labor-intensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities. RESULTS: We propose a method, RNAProB, which incorporates a new smoothed position-specific scoring matrix (PSSM) encoding scheme with a support vector machine model to predict RNA-binding sites in proteins. Besides the incorporation of evolutionary information from standard PSSM profiles, the proposed smoothed PSSM encoding scheme also considers the correlation and dependency from the neighboring residues for each amino acid in a protein. Experimental results show that smoothed PSSM encoding significantly enhances the prediction performance, especially for sensitivity. Using five-fold cross-validation, our method performs better than the state-of-the-art systems by 4.90%~6.83%, 0.88%~5.33%, and 0.10~0.23 in terms of overall accuracy, specificity, and Matthew's correlation coefficient, respectively. Most notably, compared to other approaches, RNAProB significantly improves sensitivity by 7.0%~26.9% over the benchmark data sets. To prevent data over fitting, a three-way data split procedure is incorporated to estimate the prediction performance. Moreover, physicochemical properties and amino acid preferences of RNA-binding proteins are examined and analyzed. CONCLUSION: Our results demonstrate that smoothed PSSM encoding scheme significantly enhances the performance of RNA-binding site prediction in proteins. This also supports our assumption that smoothed PSSM encoding can better resolve the ambiguity of discriminating between interacting and non-interacting residues by modelling the dependency from surrounding residues. The proposed method can be used in other research areas, such as DNA-binding site prediction, protein-protein interaction, and prediction of posttranslational modification sites. BioMed Central 2008-12-12 /pmc/articles/PMC2638146/ /pubmed/19091029 http://dx.doi.org/10.1186/1471-2105-9-S12-S6 Text en Copyright © 2008 Cheng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Cheng, Cheng-Wei
Su, Emily Chia-Yu
Hwang, Jenn-Kang
Sung, Ting-Yi
Hsu, Wen-Lian
Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
title Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
title_full Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
title_fullStr Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
title_full_unstemmed Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
title_short Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
title_sort predicting rna-binding sites of proteins using support vector machines and evolutionary information
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638146/
https://www.ncbi.nlm.nih.gov/pubmed/19091029
http://dx.doi.org/10.1186/1471-2105-9-S12-S6
work_keys_str_mv AT chengchengwei predictingrnabindingsitesofproteinsusingsupportvectormachinesandevolutionaryinformation
AT suemilychiayu predictingrnabindingsitesofproteinsusingsupportvectormachinesandevolutionaryinformation
AT hwangjennkang predictingrnabindingsitesofproteinsusingsupportvectormachinesandevolutionaryinformation
AT sungtingyi predictingrnabindingsitesofproteinsusingsupportvectormachinesandevolutionaryinformation
AT hsuwenlian predictingrnabindingsitesofproteinsusingsupportvectormachinesandevolutionaryinformation