Cargando…
Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels
[Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available tra...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634251/ https://www.ncbi.nlm.nih.gov/pubmed/37970017 http://dx.doi.org/10.1021/acsomega.3c05931 |
_version_ | 1785146196695711744 |
---|---|
author | Francoeur, Paul G. Koes, David R. |
author_facet | Francoeur, Paul G. Koes, David R. |
author_sort | Francoeur, Paul G. |
collection | PubMed |
description | [Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor–ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor–ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper. |
format | Online Article Text |
id | pubmed-10634251 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-106342512023-11-15 Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels Francoeur, Paul G. Koes, David R. ACS Omega [Image: see text] The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor–ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor–ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper. American Chemical Society 2023-10-26 /pmc/articles/PMC10634251/ /pubmed/37970017 http://dx.doi.org/10.1021/acsomega.3c05931 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Francoeur, Paul G. Koes, David R. Expanding Training Data for Structure-Based Receptor–Ligand Binding Affinity Regression through Imputation of Missing Labels |
title | Expanding Training
Data for Structure-Based Receptor–Ligand
Binding Affinity Regression through Imputation of Missing Labels |
title_full | Expanding Training
Data for Structure-Based Receptor–Ligand
Binding Affinity Regression through Imputation of Missing Labels |
title_fullStr | Expanding Training
Data for Structure-Based Receptor–Ligand
Binding Affinity Regression through Imputation of Missing Labels |
title_full_unstemmed | Expanding Training
Data for Structure-Based Receptor–Ligand
Binding Affinity Regression through Imputation of Missing Labels |
title_short | Expanding Training
Data for Structure-Based Receptor–Ligand
Binding Affinity Regression through Imputation of Missing Labels |
title_sort | expanding training
data for structure-based receptor–ligand
binding affinity regression through imputation of missing labels |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634251/ https://www.ncbi.nlm.nih.gov/pubmed/37970017 http://dx.doi.org/10.1021/acsomega.3c05931 |
work_keys_str_mv | AT francoeurpaulg expandingtrainingdataforstructurebasedreceptorligandbindingaffinityregressionthroughimputationofmissinglabels AT koesdavidr expandingtrainingdataforstructurebasedreceptorligandbindingaffinityregressionthroughimputationofmissinglabels |