Cargando…
Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable com...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495874/ https://www.ncbi.nlm.nih.gov/pubmed/32938395 http://dx.doi.org/10.1186/s12859-020-03683-3 |
_version_ | 1783582978925920256 |
---|---|
author | Li, Ke Zhang, Sijia Yan, Di Bin, Yannan Xia, Junfeng |
author_facet | Li, Ke Zhang, Sijia Yan, Di Bin, Yannan Xia, Junfeng |
author_sort | Li, Ke |
collection | PubMed |
description | BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS: Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION: Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods. |
format | Online Article Text |
id | pubmed-7495874 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-74958742020-09-23 Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting Li, Ke Zhang, Sijia Yan, Di Bin, Yannan Xia, Junfeng BMC Bioinformatics Methodology BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS: Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION: Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods. BioMed Central 2020-09-17 /pmc/articles/PMC7495874/ /pubmed/32938395 http://dx.doi.org/10.1186/s12859-020-03683-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Li, Ke Zhang, Sijia Yan, Di Bin, Yannan Xia, Junfeng Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
title | Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
title_full | Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
title_fullStr | Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
title_full_unstemmed | Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
title_short | Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
title_sort | prediction of hot spots in protein–dna binding interfaces based on supervised isometric feature mapping and extreme gradient boosting |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495874/ https://www.ncbi.nlm.nih.gov/pubmed/32938395 http://dx.doi.org/10.1186/s12859-020-03683-3 |
work_keys_str_mv | AT like predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting AT zhangsijia predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting AT yandi predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting AT binyannan predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting AT xiajunfeng predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting |