Cargando…

Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System

Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time c...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Jinjian, Wang, Nian, Chen, Peng, Zheng, Chunhou, Wang, Bing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5536031/
https://www.ncbi.nlm.nih.gov/pubmed/28718782
http://dx.doi.org/10.3390/ijms18071543
_version_ 1783253951107301376
author Jiang, Jinjian
Wang, Nian
Chen, Peng
Zheng, Chunhou
Wang, Bing
author_facet Jiang, Jinjian
Wang, Nian
Chen, Peng
Zheng, Chunhou
Wang, Bing
author_sort Jiang, Jinjian
collection PubMed
description Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.
format Online
Article
Text
id pubmed-5536031
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-55360312017-08-04 Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System Jiang, Jinjian Wang, Nian Chen, Peng Zheng, Chunhou Wang, Bing Int J Mol Sci Article Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences. MDPI 2017-07-18 /pmc/articles/PMC5536031/ /pubmed/28718782 http://dx.doi.org/10.3390/ijms18071543 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Jiang, Jinjian
Wang, Nian
Chen, Peng
Zheng, Chunhou
Wang, Bing
Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_full Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_fullStr Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_full_unstemmed Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_short Prediction of Protein Hotspots from Whole Protein Sequences by a Random Projection Ensemble System
title_sort prediction of protein hotspots from whole protein sequences by a random projection ensemble system
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5536031/
https://www.ncbi.nlm.nih.gov/pubmed/28718782
http://dx.doi.org/10.3390/ijms18071543
work_keys_str_mv AT jiangjinjian predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT wangnian predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT chenpeng predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT zhengchunhou predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem
AT wangbing predictionofproteinhotspotsfromwholeproteinsequencesbyarandomprojectionensemblesystem