Cargando…

Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels

[Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available tra...

Descripción completa

Detalles Bibliográficos
Autores principales:	Francoeur, Paul G., Koes, David R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634251/ https://www.ncbi.nlm.nih.gov/pubmed/37970017 http://dx.doi.org/10.1021/acsomega.3c05931

Descripción
Sumario:	[Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor–ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor–ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper.

Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels

Ejemplares similares