Cargando…

The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions

Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully cur...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhen, Moroz, Yurii S., Isayev, Olexandr
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10566507/
https://www.ncbi.nlm.nih.gov/pubmed/37829036
http://dx.doi.org/10.1039/d3sc03902a
_version_ 1785118927109488640
author Liu, Zhen
Moroz, Yurii S.
Isayev, Olexandr
author_facet Liu, Zhen
Moroz, Yurii S.
Isayev, Olexandr
author_sort Liu, Zhen
collection PubMed
description Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R(2) around 0.9), no method gave satisfactory results on the literature data. The best performance was an R(2) of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R(2) to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.
format Online
Article
Text
id pubmed-10566507
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-105665072023-10-12 The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions Liu, Zhen Moroz, Yurii S. Isayev, Olexandr Chem Sci Chemistry Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R(2) around 0.9), no method gave satisfactory results on the literature data. The best performance was an R(2) of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R(2) to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements. The Royal Society of Chemistry 2023-09-13 /pmc/articles/PMC10566507/ /pubmed/37829036 http://dx.doi.org/10.1039/d3sc03902a Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/
spellingShingle Chemistry
Liu, Zhen
Moroz, Yurii S.
Isayev, Olexandr
The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
title The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
title_full The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
title_fullStr The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
title_full_unstemmed The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
title_short The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
title_sort challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10566507/
https://www.ncbi.nlm.nih.gov/pubmed/37829036
http://dx.doi.org/10.1039/d3sc03902a
work_keys_str_mv AT liuzhen thechallengeofbalancingmodelsensitivityandrobustnessinpredictingyieldsabenchmarkingstudyofamidecouplingreactions
AT morozyuriis thechallengeofbalancingmodelsensitivityandrobustnessinpredictingyieldsabenchmarkingstudyofamidecouplingreactions
AT isayevolexandr thechallengeofbalancingmodelsensitivityandrobustnessinpredictingyieldsabenchmarkingstudyofamidecouplingreactions
AT liuzhen challengeofbalancingmodelsensitivityandrobustnessinpredictingyieldsabenchmarkingstudyofamidecouplingreactions
AT morozyuriis challengeofbalancingmodelsensitivityandrobustnessinpredictingyieldsabenchmarkingstudyofamidecouplingreactions
AT isayevolexandr challengeofbalancingmodelsensitivityandrobustnessinpredictingyieldsabenchmarkingstudyofamidecouplingreactions