Cargando…

Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information

Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to c...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Yijie, Tang, Jijun, Guo, Fei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5085656/
https://www.ncbi.nlm.nih.gov/pubmed/27669239
http://dx.doi.org/10.3390/ijms17101623
_version_ 1782463613658726400
author Ding, Yijie
Tang, Jijun
Guo, Fei
author_facet Ding, Yijie
Tang, Jijun
Guo, Fei
author_sort Ding, Yijie
collection PubMed
description Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity. Compared with existing methods, and the accuracy of our method is increased by [Formula: see text] percentage points. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, the accuracy of our method is increased by [Formula: see text]. On the [Formula: see text] PPI dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, and the accuracy of our method is increased by [Formula: see text]. In addition, we test our method on a very important PPI network, and it achieves [Formula: see text] accuracy. In the Wnt-related network, the accuracy of our method is increased by [Formula: see text]. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53.
format Online
Article
Text
id pubmed-5085656
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-50856562016-11-01 Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information Ding, Yijie Tang, Jijun Guo, Fei Int J Mol Sci Article Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity. Compared with existing methods, and the accuracy of our method is increased by [Formula: see text] percentage points. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, the accuracy of our method is increased by [Formula: see text]. On the [Formula: see text] PPI dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, and the accuracy of our method is increased by [Formula: see text]. In addition, we test our method on a very important PPI network, and it achieves [Formula: see text] accuracy. In the Wnt-related network, the accuracy of our method is increased by [Formula: see text]. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53. MDPI 2016-09-24 /pmc/articles/PMC5085656/ /pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ding, Yijie
Tang, Jijun
Guo, Fei
Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_full Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_fullStr Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_full_unstemmed Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_short Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_sort identification of protein–protein interactions via a novel matrix-based sequence representation model with amino acid contact information
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5085656/
https://www.ncbi.nlm.nih.gov/pubmed/27669239
http://dx.doi.org/10.3390/ijms17101623
work_keys_str_mv AT dingyijie identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation
AT tangjijun identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation
AT guofei identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation