Cargando…

Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios

[Image: see text] The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a...

Descripción completa

Detalles Bibliográficos
Autores principales: Jaume-Santero, Fernando, Bornet, Alban, Valery, Alain, Naderi, Nona, Vicente Alvarez, David, Proios, Dimitrios, Yazdani, Anthony, Bournez, Colin, Fessard, Thomas, Teodoro, Douglas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091402/
https://www.ncbi.nlm.nih.gov/pubmed/36952584
http://dx.doi.org/10.1021/acs.jcim.2c01407
_version_ 1785023126775529472
author Jaume-Santero, Fernando
Bornet, Alban
Valery, Alain
Naderi, Nona
Vicente Alvarez, David
Proios, Dimitrios
Yazdani, Anthony
Bournez, Colin
Fessard, Thomas
Teodoro, Douglas
author_facet Jaume-Santero, Fernando
Bornet, Alban
Valery, Alain
Naderi, Nona
Vicente Alvarez, David
Proios, Dimitrios
Yazdani, Anthony
Bournez, Colin
Fessard, Thomas
Teodoro, Douglas
author_sort Jaume-Santero, Fernando
collection PubMed
description [Image: see text] The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways.
format Online
Article
Text
id pubmed-10091402
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-100914022023-04-13 Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios Jaume-Santero, Fernando Bornet, Alban Valery, Alain Naderi, Nona Vicente Alvarez, David Proios, Dimitrios Yazdani, Anthony Bournez, Colin Fessard, Thomas Teodoro, Douglas J Chem Inf Model [Image: see text] The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways. American Chemical Society 2023-03-23 /pmc/articles/PMC10091402/ /pubmed/36952584 http://dx.doi.org/10.1021/acs.jcim.2c01407 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Jaume-Santero, Fernando
Bornet, Alban
Valery, Alain
Naderi, Nona
Vicente Alvarez, David
Proios, Dimitrios
Yazdani, Anthony
Bournez, Colin
Fessard, Thomas
Teodoro, Douglas
Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
title Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
title_full Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
title_fullStr Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
title_full_unstemmed Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
title_short Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
title_sort transformer performance for chemical reactions: analysis of different predictive and evaluation scenarios
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091402/
https://www.ncbi.nlm.nih.gov/pubmed/36952584
http://dx.doi.org/10.1021/acs.jcim.2c01407
work_keys_str_mv AT jaumesanterofernando transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT bornetalban transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT valeryalain transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT naderinona transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT vicentealvarezdavid transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT proiosdimitrios transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT yazdanianthony transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT bournezcolin transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT fessardthomas transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios
AT teodorodouglas transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios