Cargando…

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study

Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a de...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thomas, Morgan, Smith, Robert T., O’Boyle, Noel M., de Graaf, Chris, Bender, Andreas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117600/ https://www.ncbi.nlm.nih.gov/pubmed/33985583 http://dx.doi.org/10.1186/s13321-021-00516-0

_version_	1783691614305124352
author	Thomas, Morgan Smith, Robert T. O’Boyle, Noel M. de Graaf, Chris Bender, Andreas
author_facet	Thomas, Morgan Smith, Robert T. O’Boyle, Noel M. de Graaf, Chris Bender, Andreas
author_sort	Thomas, Morgan
collection	PubMed
description	Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide—a structure-based approach—as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00516-0.
format	Online Article Text
id	pubmed-8117600
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-81176002021-05-13 Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study Thomas, Morgan Smith, Robert T. O’Boyle, Noel M. de Graaf, Chris Bender, Andreas J Cheminform Research Article Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide—a structure-based approach—as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used internal diversity metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-021-00516-0. Springer International Publishing 2021-05-13 /pmc/articles/PMC8117600/ /pubmed/33985583 http://dx.doi.org/10.1186/s13321-021-00516-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Thomas, Morgan Smith, Robert T. O’Boyle, Noel M. de Graaf, Chris Bender, Andreas Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
title	Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
title_full	Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
title_fullStr	Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
title_full_unstemmed	Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
title_short	Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study
title_sort	comparison of structure- and ligand-based scoring functions for deep generative models: a gpcr case study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117600/ https://www.ncbi.nlm.nih.gov/pubmed/33985583 http://dx.doi.org/10.1186/s13321-021-00516-0
work_keys_str_mv	AT thomasmorgan comparisonofstructureandligandbasedscoringfunctionsfordeepgenerativemodelsagpcrcasestudy AT smithrobertt comparisonofstructureandligandbasedscoringfunctionsfordeepgenerativemodelsagpcrcasestudy AT oboylenoelm comparisonofstructureandligandbasedscoringfunctionsfordeepgenerativemodelsagpcrcasestudy AT degraafchris comparisonofstructureandligandbasedscoringfunctionsfordeepgenerativemodelsagpcrcasestudy AT benderandreas comparisonofstructureandligandbasedscoringfunctionsfordeepgenerativemodelsagpcrcasestudy

Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study

Ejemplares similares