Cargando…
Unbiasing Retrosynthesis Language Models with Disconnection Prompts
[Image: see text] Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical langu...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390024/ https://www.ncbi.nlm.nih.gov/pubmed/37529205 http://dx.doi.org/10.1021/acscentsci.3c00372 |
_version_ | 1785082390084845568 |
---|---|
author | Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro |
author_facet | Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro |
author_sort | Thakkar, Amol |
collection | PubMed |
description | [Image: see text] Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user’s digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical. |
format | Online Article Text |
id | pubmed-10390024 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-103900242023-08-01 Unbiasing Retrosynthesis Language Models with Disconnection Prompts Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro ACS Cent Sci [Image: see text] Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user’s digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical. American Chemical Society 2023-07-05 /pmc/articles/PMC10390024/ /pubmed/37529205 http://dx.doi.org/10.1021/acscentsci.3c00372 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro Unbiasing Retrosynthesis Language Models with Disconnection Prompts |
title | Unbiasing Retrosynthesis Language Models with Disconnection
Prompts |
title_full | Unbiasing Retrosynthesis Language Models with Disconnection
Prompts |
title_fullStr | Unbiasing Retrosynthesis Language Models with Disconnection
Prompts |
title_full_unstemmed | Unbiasing Retrosynthesis Language Models with Disconnection
Prompts |
title_short | Unbiasing Retrosynthesis Language Models with Disconnection
Prompts |
title_sort | unbiasing retrosynthesis language models with disconnection
prompts |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390024/ https://www.ncbi.nlm.nih.gov/pubmed/37529205 http://dx.doi.org/10.1021/acscentsci.3c00372 |
work_keys_str_mv | AT thakkaramol unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT vaucheralainc unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT byekwasoandrea unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT schwallerphilippe unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT toniatoalessandra unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT lainoteodoro unbiasingretrosynthesislanguagemodelswithdisconnectionprompts |