Cargando…

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as h...

Descripción completa

Detalles Bibliográficos
Autores principales: Cortés-Ciriano, Isidro, Škuta, Ctibor, Bender, Andreas, Svozil, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7339533/
https://www.ncbi.nlm.nih.gov/pubmed/33431016
http://dx.doi.org/10.1186/s13321-020-00444-5
_version_ 1783554912227950592
author Cortés-Ciriano, Isidro
Škuta, Ctibor
Bender, Andreas
Svozil, Daniel
author_facet Cortés-Ciriano, Isidro
Škuta, Ctibor
Bender, Andreas
Svozil, Daniel
author_sort Cortés-Ciriano, Isidro
collection PubMed
description Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K(i), K(d), IC(50) and EC(50) data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC(50) data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65–0.95 pIC(50) units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76–1.00 pIC(50) units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02–0.08 pIC(50) units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression.
format Online
Article
Text
id pubmed-7339533
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-73395332020-07-09 QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction Cortés-Ciriano, Isidro Škuta, Ctibor Bender, Andreas Svozil, Daniel J Cheminform Research Article Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K(i), K(d), IC(50) and EC(50) data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC(50) data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65–0.95 pIC(50) units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76–1.00 pIC(50) units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02–0.08 pIC(50) units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression. Springer International Publishing 2020-06-05 /pmc/articles/PMC7339533/ /pubmed/33431016 http://dx.doi.org/10.1186/s13321-020-00444-5 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Cortés-Ciriano, Isidro
Škuta, Ctibor
Bender, Andreas
Svozil, Daniel
QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
title QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
title_full QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
title_fullStr QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
title_full_unstemmed QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
title_short QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
title_sort qsar-derived affinity fingerprints (part 2): modeling performance for potency prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7339533/
https://www.ncbi.nlm.nih.gov/pubmed/33431016
http://dx.doi.org/10.1186/s13321-020-00444-5
work_keys_str_mv AT cortescirianoisidro qsarderivedaffinityfingerprintspart2modelingperformanceforpotencyprediction
AT skutactibor qsarderivedaffinityfingerprintspart2modelingperformanceforpotencyprediction
AT benderandreas qsarderivedaffinityfingerprintspart2modelingperformanceforpotencyprediction
AT svozildaniel qsarderivedaffinityfingerprintspart2modelingperformanceforpotencyprediction