Cargando…
Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
[Image: see text] Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850743/ https://www.ncbi.nlm.nih.gov/pubmed/36687099 http://dx.doi.org/10.1021/acsomega.2c05607 |
_version_ | 1784872250361511936 |
---|---|
author | Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A. |
author_facet | Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A. |
author_sort | Isert, Clemens |
collection | PubMed |
description | [Image: see text] Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental data impedes development of accurate in silico models for such compounds. In certain discovery projects at Novartis focused on such compounds, a quantum mechanics (QM)-based tool for log P estimation has emerged as a valuable supplement to experimental measurements and as a preferred alternative to existing empirical models. However, this QM-based approach incurs a substantial computational cost, limiting its applicability to small series and prohibiting quick, interactive ideation. This work explores a set of machine learning models (Random Forest, Lasso, XGBoost, Chemprop, and Chemprop3D) to learn calculated log P values on both a public data set and an in-house data set to obtain a computationally affordable, QM-based estimation of drug lipophilicity. The message-passing neural network model Chemprop emerged as the best performing model with mean absolute errors of 0.44 and 0.34 log units for scaffold split test sets of the public and in-house data sets, respectively. Analysis of learning curves suggests that a further decrease in the test set error can be achieved by increasing the training set size. While models directly trained on experimental data perform better at approximating experimentally determined log P values than models trained on calculated values, we discuss the potential advantages of using calculated log P values going beyond the limits of experimental quantitation. We analyze the impact of the data set splitting strategy and gain insights into model failure modes. Potential use cases for the presented models include pre-screening of large compound collections and prioritization of compounds for full QM calculations. |
format | Online Article Text |
id | pubmed-9850743 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-98507432023-01-20 Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A. ACS Omega [Image: see text] Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental data impedes development of accurate in silico models for such compounds. In certain discovery projects at Novartis focused on such compounds, a quantum mechanics (QM)-based tool for log P estimation has emerged as a valuable supplement to experimental measurements and as a preferred alternative to existing empirical models. However, this QM-based approach incurs a substantial computational cost, limiting its applicability to small series and prohibiting quick, interactive ideation. This work explores a set of machine learning models (Random Forest, Lasso, XGBoost, Chemprop, and Chemprop3D) to learn calculated log P values on both a public data set and an in-house data set to obtain a computationally affordable, QM-based estimation of drug lipophilicity. The message-passing neural network model Chemprop emerged as the best performing model with mean absolute errors of 0.44 and 0.34 log units for scaffold split test sets of the public and in-house data sets, respectively. Analysis of learning curves suggests that a further decrease in the test set error can be achieved by increasing the training set size. While models directly trained on experimental data perform better at approximating experimentally determined log P values than models trained on calculated values, we discuss the potential advantages of using calculated log P values going beyond the limits of experimental quantitation. We analyze the impact of the data set splitting strategy and gain insights into model failure modes. Potential use cases for the presented models include pre-screening of large compound collections and prioritization of compounds for full QM calculations. American Chemical Society 2023-01-04 /pmc/articles/PMC9850743/ /pubmed/36687099 http://dx.doi.org/10.1021/acsomega.2c05607 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A. Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity |
title | Machine Learning
for Fast, Quantum Mechanics-Based
Approximation of Drug Lipophilicity |
title_full | Machine Learning
for Fast, Quantum Mechanics-Based
Approximation of Drug Lipophilicity |
title_fullStr | Machine Learning
for Fast, Quantum Mechanics-Based
Approximation of Drug Lipophilicity |
title_full_unstemmed | Machine Learning
for Fast, Quantum Mechanics-Based
Approximation of Drug Lipophilicity |
title_short | Machine Learning
for Fast, Quantum Mechanics-Based
Approximation of Drug Lipophilicity |
title_sort | machine learning
for fast, quantum mechanics-based
approximation of drug lipophilicity |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850743/ https://www.ncbi.nlm.nih.gov/pubmed/36687099 http://dx.doi.org/10.1021/acsomega.2c05607 |
work_keys_str_mv | AT isertclemens machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT kromannjimmyc machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT stieflnikolaus machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT schneidergisbert machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT lewisricharda machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity |