Cargando…

Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of rece...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mohammadi, Sara, Narimani, Zahra, Ashouri, Mitra, Firouzi, Rohoullah, Karimi‐Jafari, Mohammad Hossein
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8748946/ https://www.ncbi.nlm.nih.gov/pubmed/35013496 http://dx.doi.org/10.1038/s41598-021-04448-5

_version_	1784631122567626752
author	Mohammadi, Sara Narimani, Zahra Ashouri, Mitra Firouzi, Rohoullah Karimi‐Jafari, Mohammad Hossein
author_facet	Mohammadi, Sara Narimani, Zahra Ashouri, Mitra Firouzi, Rohoullah Karimi‐Jafari, Mohammad Hossein
author_sort	Mohammadi, Sara
collection	PubMed
description	Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of receptor conformations is still an open question considering the issues related to the computational cost and false positive pose predictions. Here, a combination of ensemble learning and ensemble docking is suggested to rank different conformations of the target protein in light of their importance for the final accuracy of the model. Available X-ray structures of cyclin-dependent kinase 2 (CDK2) in complex with different ligands are used as an initial receptor ensemble, and its redundancy is removed through a graph-based redundancy removal, which is shown to be more efficient and less subjective than clustering-based representative selection methods. A set of ligands with available experimental affinity are docked to this nonredundant receptor ensemble, and the energetic features of the best scored poses are used in an ensemble learning procedure based on the random forest method. The importance of receptors is obtained through feature selection measures, and it is shown that a few of the most important conformations are sufficient to reach 1 kcal/mol accuracy in affinity prediction with considerable improvement of the early enrichment power of the models compared to the different ensemble docking without learning strategies. A clear strategy has been provided in which machine learning selects the most important experimental conformers of the receptor among a large set of protein–ligand complexes while simultaneously maintaining the final accuracy of affinity predictions at the highest level possible for available data. Our results could be informative for future attempts to design receptor-specific docking-rescoring strategies.
format	Online Article Text
id	pubmed-8748946
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-87489462022-01-13 Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem Mohammadi, Sara Narimani, Zahra Ashouri, Mitra Firouzi, Rohoullah Karimi‐Jafari, Mohammad Hossein Sci Rep Article Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of receptor conformations is still an open question considering the issues related to the computational cost and false positive pose predictions. Here, a combination of ensemble learning and ensemble docking is suggested to rank different conformations of the target protein in light of their importance for the final accuracy of the model. Available X-ray structures of cyclin-dependent kinase 2 (CDK2) in complex with different ligands are used as an initial receptor ensemble, and its redundancy is removed through a graph-based redundancy removal, which is shown to be more efficient and less subjective than clustering-based representative selection methods. A set of ligands with available experimental affinity are docked to this nonredundant receptor ensemble, and the energetic features of the best scored poses are used in an ensemble learning procedure based on the random forest method. The importance of receptors is obtained through feature selection measures, and it is shown that a few of the most important conformations are sufficient to reach 1 kcal/mol accuracy in affinity prediction with considerable improvement of the early enrichment power of the models compared to the different ensemble docking without learning strategies. A clear strategy has been provided in which machine learning selects the most important experimental conformers of the receptor among a large set of protein–ligand complexes while simultaneously maintaining the final accuracy of affinity predictions at the highest level possible for available data. Our results could be informative for future attempts to design receptor-specific docking-rescoring strategies. Nature Publishing Group UK 2022-01-10 /pmc/articles/PMC8748946/ /pubmed/35013496 http://dx.doi.org/10.1038/s41598-021-04448-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Mohammadi, Sara Narimani, Zahra Ashouri, Mitra Firouzi, Rohoullah Karimi‐Jafari, Mohammad Hossein Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_full	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_fullStr	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_full_unstemmed	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_short	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_sort	ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8748946/ https://www.ncbi.nlm.nih.gov/pubmed/35013496 http://dx.doi.org/10.1038/s41598-021-04448-5
work_keys_str_mv	AT mohammadisara ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT narimanizahra ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT ashourimitra ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT firouzirohoullah ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT karimijafarimohammadhossein ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem

Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Ejemplares similares