Cargando…

Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting

BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable com...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ke, Zhang, Sijia, Yan, Di, Bin, Yannan, Xia, Junfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495874/
https://www.ncbi.nlm.nih.gov/pubmed/32938395
http://dx.doi.org/10.1186/s12859-020-03683-3
_version_ 1783582978925920256
author Li, Ke
Zhang, Sijia
Yan, Di
Bin, Yannan
Xia, Junfeng
author_facet Li, Ke
Zhang, Sijia
Yan, Di
Bin, Yannan
Xia, Junfeng
author_sort Li, Ke
collection PubMed
description BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS: Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION: Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.
format Online
Article
Text
id pubmed-7495874
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74958742020-09-23 Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting Li, Ke Zhang, Sijia Yan, Di Bin, Yannan Xia, Junfeng BMC Bioinformatics Methodology BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS: Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION: Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods. BioMed Central 2020-09-17 /pmc/articles/PMC7495874/ /pubmed/32938395 http://dx.doi.org/10.1186/s12859-020-03683-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Li, Ke
Zhang, Sijia
Yan, Di
Bin, Yannan
Xia, Junfeng
Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
title Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
title_full Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
title_fullStr Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
title_full_unstemmed Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
title_short Prediction of hot spots in protein–DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
title_sort prediction of hot spots in protein–dna binding interfaces based on supervised isometric feature mapping and extreme gradient boosting
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495874/
https://www.ncbi.nlm.nih.gov/pubmed/32938395
http://dx.doi.org/10.1186/s12859-020-03683-3
work_keys_str_mv AT like predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting
AT zhangsijia predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting
AT yandi predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting
AT binyannan predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting
AT xiajunfeng predictionofhotspotsinproteindnabindinginterfacesbasedonsupervisedisometricfeaturemappingandextremegradientboosting