Cargando…

Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information

Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ding, Yijie, Tang, Jijun, Guo, Fei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2016
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5085656/ https://www.ncbi.nlm.nih.gov/pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623

_version_	1782463613658726400
author	Ding, Yijie Tang, Jijun Guo, Fei
author_facet	Ding, Yijie Tang, Jijun Guo, Fei
author_sort	Ding, Yijie
collection	PubMed
description	Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity. Compared with existing methods, and the accuracy of our method is increased by [Formula: see text] percentage points. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, the accuracy of our method is increased by [Formula: see text]. On the [Formula: see text] PPI dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, and the accuracy of our method is increased by [Formula: see text]. In addition, we test our method on a very important PPI network, and it achieves [Formula: see text] accuracy. In the Wnt-related network, the accuracy of our method is increased by [Formula: see text]. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53.
format	Online Article Text
id	pubmed-5085656
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-50856562016-11-01 Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information Ding, Yijie Tang, Jijun Guo, Fei Int J Mol Sci Article Identification of protein–protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein–protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented Gradient (HOG) and Singular Value Decomposition (SVD) on the SMR matrix. Finally, we feed the feature vector into a Random Forest (RF) for judging interaction pairs and non-interaction pairs. Our method is applied to several PPI datasets to evaluate its performance. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity. Compared with existing methods, and the accuracy of our method is increased by [Formula: see text] percentage points. On the [Formula: see text] dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, the accuracy of our method is increased by [Formula: see text]. On the [Formula: see text] PPI dataset, our method achieves [Formula: see text] accuracy and [Formula: see text] sensitivity, and the accuracy of our method is increased by [Formula: see text]. In addition, we test our method on a very important PPI network, and it achieves [Formula: see text] accuracy. In the Wnt-related network, the accuracy of our method is increased by [Formula: see text]. The source code and all datasets are available at https://figshare.com/s/580c11dce13e63cb9a53. MDPI 2016-09-24 /pmc/articles/PMC5085656/ /pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Ding, Yijie Tang, Jijun Guo, Fei Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title	Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_full	Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_fullStr	Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_full_unstemmed	Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_short	Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information
title_sort	identification of protein–protein interactions via a novel matrix-based sequence representation model with amino acid contact information
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5085656/ https://www.ncbi.nlm.nih.gov/pubmed/27669239 http://dx.doi.org/10.3390/ijms17101623
work_keys_str_mv	AT dingyijie identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation AT tangjijun identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation AT guofei identificationofproteinproteininteractionsviaanovelmatrixbasedsequencerepresentationmodelwithaminoacidcontactinformation

Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information

Ejemplares similares