Cargando…
Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model
Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to pre...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029537/ https://www.ncbi.nlm.nih.gov/pubmed/27452983 http://dx.doi.org/10.1002/pro.2991 |
_version_ | 1782454531011903488 |
---|---|
author | An, Ji‐Yong Meng, Fan‐Rong You, Zhu‐Hong Chen, Xing Yan, Gui‐Ying Hu, Ji‐Pu |
author_facet | An, Ji‐Yong Meng, Fan‐Rong You, Zhu‐Hong Chen, Xing Yan, Gui‐Ying Hu, Ji‐Pu |
author_sort | An, Ji‐Yong |
collection | PubMed |
description | Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. |
format | Online Article Text |
id | pubmed-5029537 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-50295372016-09-26 Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model An, Ji‐Yong Meng, Fan‐Rong You, Zhu‐Hong Chen, Xing Yan, Gui‐Ying Hu, Ji‐Pu Protein Sci Articles Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. John Wiley and Sons Inc. 2016-08-09 2016-10 /pmc/articles/PMC5029537/ /pubmed/27452983 http://dx.doi.org/10.1002/pro.2991 Text en © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs (http://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made. |
spellingShingle | Articles An, Ji‐Yong Meng, Fan‐Rong You, Zhu‐Hong Chen, Xing Yan, Gui‐Ying Hu, Ji‐Pu Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
title | Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
title_full | Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
title_fullStr | Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
title_full_unstemmed | Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
title_short | Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
title_sort | improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5029537/ https://www.ncbi.nlm.nih.gov/pubmed/27452983 http://dx.doi.org/10.1002/pro.2991 |
work_keys_str_mv | AT anjiyong improvingproteinproteininteractionspredictionaccuracyusingproteinevolutionaryinformationandrelevancevectormachinemodel AT mengfanrong improvingproteinproteininteractionspredictionaccuracyusingproteinevolutionaryinformationandrelevancevectormachinemodel AT youzhuhong improvingproteinproteininteractionspredictionaccuracyusingproteinevolutionaryinformationandrelevancevectormachinemodel AT chenxing improvingproteinproteininteractionspredictionaccuracyusingproteinevolutionaryinformationandrelevancevectormachinemodel AT yanguiying improvingproteinproteininteractionspredictionaccuracyusingproteinevolutionaryinformationandrelevancevectormachinemodel AT hujipu improvingproteinproteininteractionspredictionaccuracyusingproteinevolutionaryinformationandrelevancevectormachinemodel |