Cargando…

A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces

BACKGROUND: Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Bin, Wei, Xiaoming, Deng, Lei, Guan, Jihong, Zhou, Shuigeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521187/
https://www.ncbi.nlm.nih.gov/pubmed/23282146
http://dx.doi.org/10.1186/1752-0509-6-S2-S6
_version_ 1782252900979834880
author Xu, Bin
Wei, Xiaoming
Deng, Lei
Guan, Jihong
Zhou, Shuigeng
author_facet Xu, Bin
Wei, Xiaoming
Deng, Lei
Guan, Jihong
Zhou, Shuigeng
author_sort Xu, Bin
collection PubMed
description BACKGROUND: Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as effective alternatives to experimental approaches. RESULTS: In this study, we propose a semi-supervised boosting SVM, which is called sbSVM, to computationally predict hot spots at protein-protein interfaces by combining protein sequence and structure features. Here, feature selection is performed using random forests to avoid over-fitting. Due to the deficiency of positive samples, our approach samples useful unlabeled data iteratively to boost the performance of hot spots prediction. The performance evaluation of our method is carried out on a dataset generated from the ASEdb database for cross-validation and a dataset from the BID database for independent test. Furthermore, a balanced dataset with similar amounts of hot spots and non-hot spots (65 and 66 respectively) derived from the first training dataset is used to further validate our method. All results show that our method yields good sensitivity, accuracy and F1 score comparing with the existing methods. CONCLUSION: Our method boosts prediction performance of hot spots by using unlabeled data to overcome the deficiency of available training data. Experimental results show that our approach is more effective than the traditional supervised algorithms and major existing hot spot prediction methods.
format Online
Article
Text
id pubmed-3521187
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35211872012-12-14 A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces Xu, Bin Wei, Xiaoming Deng, Lei Guan, Jihong Zhou, Shuigeng BMC Syst Biol Proceedings BACKGROUND: Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as effective alternatives to experimental approaches. RESULTS: In this study, we propose a semi-supervised boosting SVM, which is called sbSVM, to computationally predict hot spots at protein-protein interfaces by combining protein sequence and structure features. Here, feature selection is performed using random forests to avoid over-fitting. Due to the deficiency of positive samples, our approach samples useful unlabeled data iteratively to boost the performance of hot spots prediction. The performance evaluation of our method is carried out on a dataset generated from the ASEdb database for cross-validation and a dataset from the BID database for independent test. Furthermore, a balanced dataset with similar amounts of hot spots and non-hot spots (65 and 66 respectively) derived from the first training dataset is used to further validate our method. All results show that our method yields good sensitivity, accuracy and F1 score comparing with the existing methods. CONCLUSION: Our method boosts prediction performance of hot spots by using unlabeled data to overcome the deficiency of available training data. Experimental results show that our approach is more effective than the traditional supervised algorithms and major existing hot spot prediction methods. BioMed Central 2012-12-12 /pmc/articles/PMC3521187/ /pubmed/23282146 http://dx.doi.org/10.1186/1752-0509-6-S2-S6 Text en Copyright ©2012 Xu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Xu, Bin
Wei, Xiaoming
Deng, Lei
Guan, Jihong
Zhou, Shuigeng
A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces
title A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces
title_full A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces
title_fullStr A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces
title_full_unstemmed A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces
title_short A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces
title_sort semi-supervised boosting svm for predicting hot spots at protein-protein interfaces
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521187/
https://www.ncbi.nlm.nih.gov/pubmed/23282146
http://dx.doi.org/10.1186/1752-0509-6-S2-S6
work_keys_str_mv AT xubin asemisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT weixiaoming asemisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT denglei asemisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT guanjihong asemisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT zhoushuigeng asemisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT xubin semisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT weixiaoming semisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT denglei semisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT guanjihong semisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces
AT zhoushuigeng semisupervisedboostingsvmforpredictinghotspotsatproteinproteininterfaces