Cargando…
Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to c...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5085656/ https://www.ncbi.nlm.nih.gov/pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623 |
_version_ | 1782463613658726400 |
---|---|
author | Ding, Yijie Tang, Jijun Guo, Fei |
author_facet | Ding, Yijie Tang, Jijun Guo, Fei |
author_sort | Ding, Yijie |
collection | PubMed |
description | Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity. Compared with existing methods, and the accuracy of our method is increased by [Formula: see text] percentage points. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, the accuracy of our method is increased by [Formula: see text]. On the [Formula: see text] PPI dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, and the accuracy of our method is increased by [Formula: see text]. In addition, we test our method on a very important PPI network, and it achieves [Formula: see text] accuracy. In the Wnt-related network, the accuracy of our method is increased by [Formula: see text]. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53. |
format | Online Article Text |
id | pubmed-5085656 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-50856562016-11-01 Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information Ding, Yijie Tang, Jijun Guo, Fei Int J Mol Sci Article Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity. Compared with existing methods, and the accuracy of our method is increased by [Formula: see text] percentage points. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, the accuracy of our method is increased by [Formula: see text]. On the [Formula: see text] PPI dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, and the accuracy of our method is increased by [Formula: see text]. In addition, we test our method on a very important PPI network, and it achieves [Formula: see text] accuracy. In the Wnt-related network, the accuracy of our method is increased by [Formula: see text]. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53. MDPI 2016-09-24 /pmc/articles/PMC5085656/ /pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ding, Yijie Tang, Jijun Guo, Fei Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information |
title | Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information |
title_full | Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information |
title_fullStr | Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information |
title_full_unstemmed | Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information |
title_short | Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information |
title_sort | identification of protein–protein interactions via a novel matrix-based sequence representation model with amino acid contact information |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5085656/ https://www.ncbi.nlm.nih.gov/pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623 |
work_keys_str_mv | AT dingyijie identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation AT tangjijun identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation AT guofei identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation |