Cargando…

Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels

[Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available tra...

Descripción completa

Detalles Bibliográficos
Autores principales: Francoeur, Paul G., Koes, David R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634251/
https://www.ncbi.nlm.nih.gov/pubmed/37970017
http://dx.doi.org/10.1021/acsomega.3c05931
_version_ 1785146196695711744
author Francoeur, Paul G.
Koes, David R.
author_facet Francoeur, Paul G.
Koes, David R.
author_sort Francoeur, Paul G.
collection PubMed
description [Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor–ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor–ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper.
format Online
Article
Text
id pubmed-10634251
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106342512023-11-15 Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels Francoeur, Paul G. Koes, David R. ACS Omega [Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor–ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor–ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper. American Chemical Society 2023-10-26 /pmc/articles/PMC10634251/ /pubmed/37970017 http://dx.doi.org/10.1021/acsomega.3c05931 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Francoeur, Paul G.
Koes, David R.
Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
title Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
title_full Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
title_fullStr Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
title_full_unstemmed Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
title_short Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
title_sort expanding training data for structure-based receptor–ligand binding affinity regression through imputation of missing labels
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634251/
https://www.ncbi.nlm.nih.gov/pubmed/37970017
http://dx.doi.org/10.1021/acsomega.3c05931
work_keys_str_mv AT francoeurpaulg expandingtrainingdataforstructurebasedreceptorligandbindingaffinityregressionthroughimputationofmissinglabels
AT koesdavidr expandingtrainingdataforstructurebasedreceptorligandbindingaffinityregressionthroughimputationofmissinglabels