Cargando…

A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening

[Image: see text] Over the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. O...

Descripción completa

Detalles Bibliográficos
Autores principales:	Scantlebury, Jack, Vost, Lucy, Carbery, Anna, Hadfield, Thomas E., Turnbull, Oliver M., Brown, Nathan, Chenthamarakshan, Vijil, Das, Payel, Grosjean, Harold, von Delft, Frank, Deane, Charlotte M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10207375/ https://www.ncbi.nlm.nih.gov/pubmed/37166179 http://dx.doi.org/10.1021/acs.jcim.3c00322

_version_	1785046440062484480
author	Scantlebury, Jack Vost, Lucy Carbery, Anna Hadfield, Thomas E. Turnbull, Oliver M. Brown, Nathan Chenthamarakshan, Vijil Das, Payel Grosjean, Harold von Delft, Frank Deane, Charlotte M.
author_facet	Scantlebury, Jack Vost, Lucy Carbery, Anna Hadfield, Thomas E. Turnbull, Oliver M. Brown, Nathan Chenthamarakshan, Vijil Das, Payel Grosjean, Harold von Delft, Frank Deane, Charlotte M.
author_sort	Scantlebury, Jack
collection	PubMed
description	[Image: see text] Over the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. Only a scoring function that accounts for the interatomic interactions involved in binding can accurately predict binding affinity on unseen molecules. However, many scoring functions make predictions based on data set biases rather than an understanding of the physics of binding. These scoring functions perform well when tested on similar targets to those in the training set but fail to generalize to dissimilar targets. To test what a machine learning-based scoring function has learned, input attribution, a technique for learning which features are important to a model when making a prediction on a particular data point, can be applied. If a model successfully learns something beyond data set biases, attribution should give insight into the important binding interactions that are taking place. We built a machine learning-based scoring function that aimed to avoid the influence of bias via thorough train and test data set filtering and show that it achieves comparable performance on the Comparative Assessment of Scoring Functions, 2016 (CASF-2016) benchmark to other leading methods. We then use the CASF-2016 test set to perform attribution and find that the bonds identified as important by PointVS, unlike those extracted from other scoring functions, have a high correlation with those found by a distance-based interaction profiler. We then show that attribution can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration and see improvements in docking scores compared to using structural information from a traditional, data-based approach. This not only provides definitive proof that the scoring function has learned to identify some important binding interactions but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design.
format	Online Article Text
id	pubmed-10207375
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-102073752023-05-25 A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening Scantlebury, Jack Vost, Lucy Carbery, Anna Hadfield, Thomas E. Turnbull, Oliver M. Brown, Nathan Chenthamarakshan, Vijil Das, Payel Grosjean, Harold von Delft, Frank Deane, Charlotte M. J Chem Inf Model [Image: see text] Over the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. Only a scoring function that accounts for the interatomic interactions involved in binding can accurately predict binding affinity on unseen molecules. However, many scoring functions make predictions based on data set biases rather than an understanding of the physics of binding. These scoring functions perform well when tested on similar targets to those in the training set but fail to generalize to dissimilar targets. To test what a machine learning-based scoring function has learned, input attribution, a technique for learning which features are important to a model when making a prediction on a particular data point, can be applied. If a model successfully learns something beyond data set biases, attribution should give insight into the important binding interactions that are taking place. We built a machine learning-based scoring function that aimed to avoid the influence of bias via thorough train and test data set filtering and show that it achieves comparable performance on the Comparative Assessment of Scoring Functions, 2016 (CASF-2016) benchmark to other leading methods. We then use the CASF-2016 test set to perform attribution and find that the bonds identified as important by PointVS, unlike those extracted from other scoring functions, have a high correlation with those found by a distance-based interaction profiler. We then show that attribution can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration and see improvements in docking scores compared to using structural information from a traditional, data-based approach. This not only provides definitive proof that the scoring function has learned to identify some important binding interactions but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design. American Chemical Society 2023-05-11 /pmc/articles/PMC10207375/ /pubmed/37166179 http://dx.doi.org/10.1021/acs.jcim.3c00322 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Scantlebury, Jack Vost, Lucy Carbery, Anna Hadfield, Thomas E. Turnbull, Oliver M. Brown, Nathan Chenthamarakshan, Vijil Das, Payel Grosjean, Harold von Delft, Frank Deane, Charlotte M. A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
title	A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
title_full	A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
title_fullStr	A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
title_full_unstemmed	A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
title_short	A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening
title_sort	small step toward generalizability: training a machine learning scoring function for structure-based virtual screening
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10207375/ https://www.ncbi.nlm.nih.gov/pubmed/37166179 http://dx.doi.org/10.1021/acs.jcim.3c00322
work_keys_str_mv	AT scantleburyjack asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT vostlucy asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT carberyanna asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT hadfieldthomase asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT turnbulloliverm asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT brownnathan asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT chenthamarakshanvijil asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT daspayel asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT grosjeanharold asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT vondelftfrank asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT deanecharlottem asmallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT scantleburyjack smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT vostlucy smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT carberyanna smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT hadfieldthomase smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT turnbulloliverm smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT brownnathan smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT chenthamarakshanvijil smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT daspayel smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT grosjeanharold smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT vondelftfrank smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening AT deanecharlottem smallsteptowardgeneralizabilitytrainingamachinelearningscoringfunctionforstructurebasedvirtualscreening

A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening

Ejemplares similares