Cargando…

Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity

[Image: see text] Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental...

Descripción completa

Detalles Bibliográficos
Autores principales:	Isert, Clemens, Kromann, Jimmy C., Stiefl, Nikolaus, Schneider, Gisbert, Lewis, Richard A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850743/ https://www.ncbi.nlm.nih.gov/pubmed/36687099 http://dx.doi.org/10.1021/acsomega.2c05607

_version_	1784872250361511936
author	Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A.
author_facet	Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A.
author_sort	Isert, Clemens
collection	PubMed
description	[Image: see text] Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental data impedes development of accurate in silico models for such compounds. In certain discovery projects at Novartis focused on such compounds, a quantum mechanics (QM)-based tool for log P estimation has emerged as a valuable supplement to experimental measurements and as a preferred alternative to existing empirical models. However, this QM-based approach incurs a substantial computational cost, limiting its applicability to small series and prohibiting quick, interactive ideation. This work explores a set of machine learning models (Random Forest, Lasso, XGBoost, Chemprop, and Chemprop3D) to learn calculated log P values on both a public data set and an in-house data set to obtain a computationally affordable, QM-based estimation of drug lipophilicity. The message-passing neural network model Chemprop emerged as the best performing model with mean absolute errors of 0.44 and 0.34 log units for scaffold split test sets of the public and in-house data sets, respectively. Analysis of learning curves suggests that a further decrease in the test set error can be achieved by increasing the training set size. While models directly trained on experimental data perform better at approximating experimentally determined log P values than models trained on calculated values, we discuss the potential advantages of using calculated log P values going beyond the limits of experimental quantitation. We analyze the impact of the data set splitting strategy and gain insights into model failure modes. Potential use cases for the presented models include pre-screening of large compound collections and prioritization of compounds for full QM calculations.
format	Online Article Text
id	pubmed-9850743
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-98507432023-01-20 Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A. ACS Omega [Image: see text] Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental data impedes development of accurate in silico models for such compounds. In certain discovery projects at Novartis focused on such compounds, a quantum mechanics (QM)-based tool for log P estimation has emerged as a valuable supplement to experimental measurements and as a preferred alternative to existing empirical models. However, this QM-based approach incurs a substantial computational cost, limiting its applicability to small series and prohibiting quick, interactive ideation. This work explores a set of machine learning models (Random Forest, Lasso, XGBoost, Chemprop, and Chemprop3D) to learn calculated log P values on both a public data set and an in-house data set to obtain a computationally affordable, QM-based estimation of drug lipophilicity. The message-passing neural network model Chemprop emerged as the best performing model with mean absolute errors of 0.44 and 0.34 log units for scaffold split test sets of the public and in-house data sets, respectively. Analysis of learning curves suggests that a further decrease in the test set error can be achieved by increasing the training set size. While models directly trained on experimental data perform better at approximating experimentally determined log P values than models trained on calculated values, we discuss the potential advantages of using calculated log P values going beyond the limits of experimental quantitation. We analyze the impact of the data set splitting strategy and gain insights into model failure modes. Potential use cases for the presented models include pre-screening of large compound collections and prioritization of compounds for full QM calculations. American Chemical Society 2023-01-04 /pmc/articles/PMC9850743/ /pubmed/36687099 http://dx.doi.org/10.1021/acsomega.2c05607 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Isert, Clemens Kromann, Jimmy C. Stiefl, Nikolaus Schneider, Gisbert Lewis, Richard A. Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
title	Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
title_full	Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
title_fullStr	Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
title_full_unstemmed	Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
title_short	Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity
title_sort	machine learning for fast, quantum mechanics-based approximation of drug lipophilicity
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850743/ https://www.ncbi.nlm.nih.gov/pubmed/36687099 http://dx.doi.org/10.1021/acsomega.2c05607
work_keys_str_mv	AT isertclemens machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT kromannjimmyc machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT stieflnikolaus machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT schneidergisbert machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity AT lewisricharda machinelearningforfastquantummechanicsbasedapproximationofdruglipophilicity

Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity

Ejemplares similares