Cargando…

The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction

Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein–ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shen, Chao, Hu, Xueping, Gao, Junbo, Zhang, Xujun, Zhong, Haiyang, Wang, Zhe, Xu, Lei, Kang, Yu, Cao, Dongsheng, Hou, Tingjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8520186/ https://www.ncbi.nlm.nih.gov/pubmed/34656169 http://dx.doi.org/10.1186/s13321-021-00560-w

_version_	1784584614876020736
author	Shen, Chao Hu, Xueping Gao, Junbo Zhang, Xujun Zhong, Haiyang Wang, Zhe Xu, Lei Kang, Yu Cao, Dongsheng Hou, Tingjun
author_facet	Shen, Chao Hu, Xueping Gao, Junbo Zhang, Xujun Zhong, Haiyang Wang, Zhe Xu, Lei Kang, Yu Cao, Dongsheng Hou, Tingjun
author_sort	Shen, Chao
collection	PubMed
description	Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein–ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936, respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein–ligand binding poses. [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00560-w.
format	Online Article Text
id	pubmed-8520186
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-85201862021-10-20 The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction Shen, Chao Hu, Xueping Gao, Junbo Zhang, Xujun Zhong, Haiyang Wang, Zhe Xu, Lei Kang, Yu Cao, Dongsheng Hou, Tingjun J Cheminform Research Article Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein–ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936, respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein–ligand binding poses. [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00560-w. Springer International Publishing 2021-10-16 /pmc/articles/PMC8520186/ /pubmed/34656169 http://dx.doi.org/10.1186/s13321-021-00560-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Shen, Chao Hu, Xueping Gao, Junbo Zhang, Xujun Zhong, Haiyang Wang, Zhe Xu, Lei Kang, Yu Cao, Dongsheng Hou, Tingjun The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
title	The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
title_full	The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
title_fullStr	The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
title_full_unstemmed	The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
title_short	The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
title_sort	impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8520186/ https://www.ncbi.nlm.nih.gov/pubmed/34656169 http://dx.doi.org/10.1186/s13321-021-00560-w
work_keys_str_mv	AT shenchao theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT huxueping theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT gaojunbo theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT zhangxujun theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT zhonghaiyang theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT wangzhe theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT xulei theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT kangyu theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT caodongsheng theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT houtingjun theimpactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT shenchao impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT huxueping impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT gaojunbo impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT zhangxujun impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT zhonghaiyang impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT wangzhe impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT xulei impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT kangyu impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT caodongsheng impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction AT houtingjun impactofcrossdockedposesonperformanceofmachinelearningclassifierforproteinligandbindingposeprediction

The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction

Ejemplares similares