Cargando…
Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508770/ https://www.ncbi.nlm.nih.gov/pubmed/37732243 http://dx.doi.org/10.1101/2023.09.04.556234 |
_version_ | 1785107606647341056 |
---|---|
author | Ong, Wern Juin Gabriel Kirubakaran, Palani Karanicolas, John |
author_facet | Ong, Wern Juin Gabriel Kirubakaran, Palani Karanicolas, John |
author_sort | Ong, Wern Juin Gabriel |
collection | PubMed |
description | The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors’ SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models’ performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors. |
format | Online Article Text |
id | pubmed-10508770 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-105087702023-09-20 Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors Ong, Wern Juin Gabriel Kirubakaran, Palani Karanicolas, John bioRxiv Article The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors’ SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models’ performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors. Cold Spring Harbor Laboratory 2023-09-06 /pmc/articles/PMC10508770/ /pubmed/37732243 http://dx.doi.org/10.1101/2023.09.04.556234 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Ong, Wern Juin Gabriel Kirubakaran, Palani Karanicolas, John Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors |
title | Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors |
title_full | Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors |
title_fullStr | Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors |
title_full_unstemmed | Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors |
title_short | Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors |
title_sort | poor generalization by current deep learning models for predicting binding affinities of kinase inhibitors |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508770/ https://www.ncbi.nlm.nih.gov/pubmed/37732243 http://dx.doi.org/10.1101/2023.09.04.556234 |
work_keys_str_mv | AT ongwernjuingabriel poorgeneralizationbycurrentdeeplearningmodelsforpredictingbindingaffinitiesofkinaseinhibitors AT kirubakaranpalani poorgeneralizationbycurrentdeeplearningmodelsforpredictingbindingaffinitiesofkinaseinhibitors AT karanicolasjohn poorgeneralizationbycurrentdeeplearningmodelsforpredictingbindingaffinitiesofkinaseinhibitors |