Cargando…

Rigorous assessment and integration of the sequence and structure based features to predict hot spots

BACKGROUND: Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Ruoying, Chen, Wenjing, Yang, Sixiao, Wu, Di, Wang, Yong, Tian, Yingjie, Shi, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176265/
https://www.ncbi.nlm.nih.gov/pubmed/21798070
http://dx.doi.org/10.1186/1471-2105-12-311
_version_ 1782212206891368448
author Chen, Ruoying
Chen, Wenjing
Yang, Sixiao
Wu, Di
Wang, Yong
Tian, Yingjie
Shi, Yong
author_facet Chen, Ruoying
Chen, Wenjing
Yang, Sixiao
Wu, Di
Wang, Yong
Tian, Yingjie
Shi, Yong
author_sort Chen, Ruoying
collection PubMed
description BACKGROUND: Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. RESULTS: In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. CONCLUSION: Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
format Online
Article
Text
id pubmed-3176265
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31762652011-09-20 Rigorous assessment and integration of the sequence and structure based features to predict hot spots Chen, Ruoying Chen, Wenjing Yang, Sixiao Wu, Di Wang, Yong Tian, Yingjie Shi, Yong BMC Bioinformatics Research Article BACKGROUND: Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. RESULTS: In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. CONCLUSION: Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. BioMed Central 2011-07-29 /pmc/articles/PMC3176265/ /pubmed/21798070 http://dx.doi.org/10.1186/1471-2105-12-311 Text en Copyright ©2011 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chen, Ruoying
Chen, Wenjing
Yang, Sixiao
Wu, Di
Wang, Yong
Tian, Yingjie
Shi, Yong
Rigorous assessment and integration of the sequence and structure based features to predict hot spots
title Rigorous assessment and integration of the sequence and structure based features to predict hot spots
title_full Rigorous assessment and integration of the sequence and structure based features to predict hot spots
title_fullStr Rigorous assessment and integration of the sequence and structure based features to predict hot spots
title_full_unstemmed Rigorous assessment and integration of the sequence and structure based features to predict hot spots
title_short Rigorous assessment and integration of the sequence and structure based features to predict hot spots
title_sort rigorous assessment and integration of the sequence and structure based features to predict hot spots
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176265/
https://www.ncbi.nlm.nih.gov/pubmed/21798070
http://dx.doi.org/10.1186/1471-2105-12-311
work_keys_str_mv AT chenruoying rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots
AT chenwenjing rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots
AT yangsixiao rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots
AT wudi rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots
AT wangyong rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots
AT tianyingjie rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots
AT shiyong rigorousassessmentandintegrationofthesequenceandstructurebasedfeaturestopredicthotspots