Cargando…

Exploring QSAR models for activity-cliff prediction

INTRODUCTION AND METHODOLOGY: Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs...

Descripción completa

Detalles Bibliográficos
Autores principales: Dablander, Markus, Hanser, Thierry, Lambiotte, Renaud, Morris, Garrett M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107580/
https://www.ncbi.nlm.nih.gov/pubmed/37069675
http://dx.doi.org/10.1186/s13321-023-00708-w
_version_ 1785026637016858624
author Dablander, Markus
Hanser, Thierry
Lambiotte, Renaud
Morris, Garrett M.
author_facet Dablander, Markus
Hanser, Thierry
Lambiotte, Renaud
Morris, Garrett M.
author_sort Dablander, Markus
collection PubMed
description INTRODUCTION AND METHODOLOGY: Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. RESULTS AND CONCLUSIONS: Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity. GRAPHICAL ABSTRACT: [Image: see text]
format Online
Article
Text
id pubmed-10107580
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-101075802023-04-18 Exploring QSAR models for activity-cliff prediction Dablander, Markus Hanser, Thierry Lambiotte, Renaud Morris, Garrett M. J Cheminform Research INTRODUCTION AND METHODOLOGY: Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. RESULTS AND CONCLUSIONS: Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity. GRAPHICAL ABSTRACT: [Image: see text] Springer International Publishing 2023-04-17 /pmc/articles/PMC10107580/ /pubmed/37069675 http://dx.doi.org/10.1186/s13321-023-00708-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dablander, Markus
Hanser, Thierry
Lambiotte, Renaud
Morris, Garrett M.
Exploring QSAR models for activity-cliff prediction
title Exploring QSAR models for activity-cliff prediction
title_full Exploring QSAR models for activity-cliff prediction
title_fullStr Exploring QSAR models for activity-cliff prediction
title_full_unstemmed Exploring QSAR models for activity-cliff prediction
title_short Exploring QSAR models for activity-cliff prediction
title_sort exploring qsar models for activity-cliff prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10107580/
https://www.ncbi.nlm.nih.gov/pubmed/37069675
http://dx.doi.org/10.1186/s13321-023-00708-w
work_keys_str_mv AT dablandermarkus exploringqsarmodelsforactivitycliffprediction
AT hanserthierry exploringqsarmodelsforactivitycliffprediction
AT lambiotterenaud exploringqsarmodelsforactivitycliffprediction
AT morrisgarrettm exploringqsarmodelsforactivitycliffprediction