Cargando…

Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors

The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical...

Descripción completa

Detalles Bibliográficos
Autores principales: Ong, Wern Juin Gabriel, Kirubakaran, Palani, Karanicolas, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508770/
https://www.ncbi.nlm.nih.gov/pubmed/37732243
http://dx.doi.org/10.1101/2023.09.04.556234
_version_ 1785107606647341056
author Ong, Wern Juin Gabriel
Kirubakaran, Palani
Karanicolas, John
author_facet Ong, Wern Juin Gabriel
Kirubakaran, Palani
Karanicolas, John
author_sort Ong, Wern Juin Gabriel
collection PubMed
description The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors’ SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models’ performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.
format Online
Article
Text
id pubmed-10508770
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105087702023-09-20 Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors Ong, Wern Juin Gabriel Kirubakaran, Palani Karanicolas, John bioRxiv Article The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors’ SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models’ performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors. Cold Spring Harbor Laboratory 2023-09-06 /pmc/articles/PMC10508770/ /pubmed/37732243 http://dx.doi.org/10.1101/2023.09.04.556234 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Ong, Wern Juin Gabriel
Kirubakaran, Palani
Karanicolas, John
Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
title Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
title_full Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
title_fullStr Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
title_full_unstemmed Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
title_short Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors
title_sort poor generalization by current deep learning models for predicting binding affinities of kinase inhibitors
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508770/
https://www.ncbi.nlm.nih.gov/pubmed/37732243
http://dx.doi.org/10.1101/2023.09.04.556234
work_keys_str_mv AT ongwernjuingabriel poorgeneralizationbycurrentdeeplearningmodelsforpredictingbindingaffinitiesofkinaseinhibitors
AT kirubakaranpalani poorgeneralizationbycurrentdeeplearningmodelsforpredictingbindingaffinitiesofkinaseinhibitors
AT karanicolasjohn poorgeneralizationbycurrentdeeplearningmodelsforpredictingbindingaffinitiesofkinaseinhibitors