Cargando…
Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting
Identification of hot spots, a small portion of protein-protein interface residues that contribute the majority of the binding free energy, can provide crucial information for understanding the function of proteins and studying their interactions. Based on our previous method (PredHS), we propose a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6155324/ https://www.ncbi.nlm.nih.gov/pubmed/30250210 http://dx.doi.org/10.1038/s41598-018-32511-1 |
_version_ | 1783357876475002880 |
---|---|
author | Wang, Hao Liu, Chuyao Deng, Lei |
author_facet | Wang, Hao Liu, Chuyao Deng, Lei |
author_sort | Wang, Hao |
collection | PubMed |
description | Identification of hot spots, a small portion of protein-protein interface residues that contribute the majority of the binding free energy, can provide crucial information for understanding the function of proteins and studying their interactions. Based on our previous method (PredHS), we propose a new computational approach, PredHS2, that can further improve the accuracy of predicting hot spots at protein-protein interfaces. Firstly we build a new training dataset of 313 alanine-mutated interface residues extracted from 34 protein complexes. Then we generate a wide variety of 600 sequence, structure, exposure and energy features, together with Euclidean and Voronoi neighborhood properties. To remove redundant and irrelevant information, we select a set of 26 optimal features utilizing a two-step feature selection method, which consist of a minimum Redundancy Maximum Relevance (mRMR) procedure and a sequential forward selection process. Based on the selected 26 features, we use Extreme Gradient Boosting (XGBoost) to build our prediction model. Performance of our PredHS2 approach outperforms other machine learning algorithms and other state-of-the-art hot spot prediction methods on the training dataset and the independent test set (BID) respectively. Several novel features, such as solvent exposure characteristics, second structure features and disorder scores, are found to be more effective in discriminating hot spots. Moreover, the update of the training dataset and the new feature selection and classification algorithms play a vital role in improving the prediction quality. |
format | Online Article Text |
id | pubmed-6155324 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-61553242018-09-28 Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting Wang, Hao Liu, Chuyao Deng, Lei Sci Rep Article Identification of hot spots, a small portion of protein-protein interface residues that contribute the majority of the binding free energy, can provide crucial information for understanding the function of proteins and studying their interactions. Based on our previous method (PredHS), we propose a new computational approach, PredHS2, that can further improve the accuracy of predicting hot spots at protein-protein interfaces. Firstly we build a new training dataset of 313 alanine-mutated interface residues extracted from 34 protein complexes. Then we generate a wide variety of 600 sequence, structure, exposure and energy features, together with Euclidean and Voronoi neighborhood properties. To remove redundant and irrelevant information, we select a set of 26 optimal features utilizing a two-step feature selection method, which consist of a minimum Redundancy Maximum Relevance (mRMR) procedure and a sequential forward selection process. Based on the selected 26 features, we use Extreme Gradient Boosting (XGBoost) to build our prediction model. Performance of our PredHS2 approach outperforms other machine learning algorithms and other state-of-the-art hot spot prediction methods on the training dataset and the independent test set (BID) respectively. Several novel features, such as solvent exposure characteristics, second structure features and disorder scores, are found to be more effective in discriminating hot spots. Moreover, the update of the training dataset and the new feature selection and classification algorithms play a vital role in improving the prediction quality. Nature Publishing Group UK 2018-09-24 /pmc/articles/PMC6155324/ /pubmed/30250210 http://dx.doi.org/10.1038/s41598-018-32511-1 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Wang, Hao Liu, Chuyao Deng, Lei Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting |
title | Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting |
title_full | Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting |
title_fullStr | Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting |
title_full_unstemmed | Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting |
title_short | Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting |
title_sort | enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6155324/ https://www.ncbi.nlm.nih.gov/pubmed/30250210 http://dx.doi.org/10.1038/s41598-018-32511-1 |
work_keys_str_mv | AT wanghao enhancedpredictionofhotspotsatproteinproteininterfacesusingextremegradientboosting AT liuchuyao enhancedpredictionofhotspotsatproteinproteininterfacesusingextremegradientboosting AT denglei enhancedpredictionofhotspotsatproteinproteininterfacesusingextremegradientboosting |