Cargando…

Unbiasing Retrosynthesis Language Models with Disconnection Prompts

[Image: see text] Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical langu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thakkar, Amol, Vaucher, Alain C., Byekwaso, Andrea, Schwaller, Philippe, Toniato, Alessandra, Laino, Teodoro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390024/ https://www.ncbi.nlm.nih.gov/pubmed/37529205 http://dx.doi.org/10.1021/acscentsci.3c00372

_version_	1785082390084845568
author	Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro
author_facet	Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro
author_sort	Thakkar, Amol
collection	PubMed
description	[Image: see text] Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user’s digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical.
format	Online Article Text
id	pubmed-10390024
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-103900242023-08-01 Unbiasing Retrosynthesis Language Models with Disconnection Prompts Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro ACS Cent Sci [Image: see text] Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user’s digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical. American Chemical Society 2023-07-05 /pmc/articles/PMC10390024/ /pubmed/37529205 http://dx.doi.org/10.1021/acscentsci.3c00372 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Thakkar, Amol Vaucher, Alain C. Byekwaso, Andrea Schwaller, Philippe Toniato, Alessandra Laino, Teodoro Unbiasing Retrosynthesis Language Models with Disconnection Prompts
title	Unbiasing Retrosynthesis Language Models with Disconnection Prompts
title_full	Unbiasing Retrosynthesis Language Models with Disconnection Prompts
title_fullStr	Unbiasing Retrosynthesis Language Models with Disconnection Prompts
title_full_unstemmed	Unbiasing Retrosynthesis Language Models with Disconnection Prompts
title_short	Unbiasing Retrosynthesis Language Models with Disconnection Prompts
title_sort	unbiasing retrosynthesis language models with disconnection prompts
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390024/ https://www.ncbi.nlm.nih.gov/pubmed/37529205 http://dx.doi.org/10.1021/acscentsci.3c00372
work_keys_str_mv	AT thakkaramol unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT vaucheralainc unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT byekwasoandrea unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT schwallerphilippe unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT toniatoalessandra unbiasingretrosynthesislanguagemodelswithdisconnectionprompts AT lainoteodoro unbiasingretrosynthesislanguagemodelswithdisconnectionprompts

Unbiasing Retrosynthesis Language Models with Disconnection Prompts

Ejemplares similares