Cargando…

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

[Image: see text] Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs—pairs of molecules that are highly similar in their structure but exhibit large differences...

Descripción completa

Detalles Bibliográficos
Autores principales:	van Tilborg, Derek, Alenicheva, Alisa, Grisoni, Francesca
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2022
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9749029/ https://www.ncbi.nlm.nih.gov/pubmed/36456532 http://dx.doi.org/10.1021/acs.jcim.2c01073

_version_	1784849956510629888
author	van Tilborg, Derek Alenicheva, Alisa Grisoni, Francesca
author_facet	van Tilborg, Derek Alenicheva, Alisa Grisoni, Francesca
author_sort	van Tilborg, Derek
collection	PubMed
description	[Image: see text] Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs—pairs of molecules that are highly similar in their structure but exhibit large differences in potency—have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated “activity-cliff-centered” metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
format	Online Article Text
id	pubmed-9749029
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-97490292022-12-15 Exposing the Limitations of Molecular Machine Learning with Activity Cliffs van Tilborg, Derek Alenicheva, Alisa Grisoni, Francesca J Chem Inf Model [Image: see text] Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs—pairs of molecules that are highly similar in their structure but exhibit large differences in potency—have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated “activity-cliff-centered” metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs. American Chemical Society 2022-12-01 2022-12-12 /pmc/articles/PMC9749029/ /pubmed/36456532 http://dx.doi.org/10.1021/acs.jcim.2c01073 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	van Tilborg, Derek Alenicheva, Alisa Grisoni, Francesca Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
title	Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
title_full	Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
title_fullStr	Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
title_full_unstemmed	Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
title_short	Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
title_sort	exposing the limitations of molecular machine learning with activity cliffs
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9749029/ https://www.ncbi.nlm.nih.gov/pubmed/36456532 http://dx.doi.org/10.1021/acs.jcim.2c01073
work_keys_str_mv	AT vantilborgderek exposingthelimitationsofmolecularmachinelearningwithactivitycliffs AT alenichevaalisa exposingthelimitationsofmolecularmachinelearningwithactivitycliffs AT grisonifrancesca exposingthelimitationsofmolecularmachinelearningwithactivitycliffs

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

Ejemplares similares