Cargando…
Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios
[Image: see text] The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091402/ https://www.ncbi.nlm.nih.gov/pubmed/36952584 http://dx.doi.org/10.1021/acs.jcim.2c01407 |
_version_ | 1785023126775529472 |
---|---|
author | Jaume-Santero, Fernando Bornet, Alban Valery, Alain Naderi, Nona Vicente Alvarez, David Proios, Dimitrios Yazdani, Anthony Bournez, Colin Fessard, Thomas Teodoro, Douglas |
author_facet | Jaume-Santero, Fernando Bornet, Alban Valery, Alain Naderi, Nona Vicente Alvarez, David Proios, Dimitrios Yazdani, Anthony Bournez, Colin Fessard, Thomas Teodoro, Douglas |
author_sort | Jaume-Santero, Fernando |
collection | PubMed |
description | [Image: see text] The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways. |
format | Online Article Text |
id | pubmed-10091402 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-100914022023-04-13 Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios Jaume-Santero, Fernando Bornet, Alban Valery, Alain Naderi, Nona Vicente Alvarez, David Proios, Dimitrios Yazdani, Anthony Bournez, Colin Fessard, Thomas Teodoro, Douglas J Chem Inf Model [Image: see text] The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways. American Chemical Society 2023-03-23 /pmc/articles/PMC10091402/ /pubmed/36952584 http://dx.doi.org/10.1021/acs.jcim.2c01407 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Jaume-Santero, Fernando Bornet, Alban Valery, Alain Naderi, Nona Vicente Alvarez, David Proios, Dimitrios Yazdani, Anthony Bournez, Colin Fessard, Thomas Teodoro, Douglas Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios |
title | Transformer Performance
for Chemical Reactions: Analysis
of Different Predictive and Evaluation Scenarios |
title_full | Transformer Performance
for Chemical Reactions: Analysis
of Different Predictive and Evaluation Scenarios |
title_fullStr | Transformer Performance
for Chemical Reactions: Analysis
of Different Predictive and Evaluation Scenarios |
title_full_unstemmed | Transformer Performance
for Chemical Reactions: Analysis
of Different Predictive and Evaluation Scenarios |
title_short | Transformer Performance
for Chemical Reactions: Analysis
of Different Predictive and Evaluation Scenarios |
title_sort | transformer performance
for chemical reactions: analysis
of different predictive and evaluation scenarios |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091402/ https://www.ncbi.nlm.nih.gov/pubmed/36952584 http://dx.doi.org/10.1021/acs.jcim.2c01407 |
work_keys_str_mv | AT jaumesanterofernando transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT bornetalban transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT valeryalain transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT naderinona transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT vicentealvarezdavid transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT proiosdimitrios transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT yazdanianthony transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT bournezcolin transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT fessardthomas transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios AT teodorodouglas transformerperformanceforchemicalreactionsanalysisofdifferentpredictiveandevaluationscenarios |