Cargando…
“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry....
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Royal Society of Chemistry
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6053976/ https://www.ncbi.nlm.nih.gov/pubmed/30090297 http://dx.doi.org/10.1039/c8sc02339e |
_version_ | 1783340928713359360 |
---|---|
author | Schwaller, Philippe Gaudin, Théophile Lányi, Dávid Bekas, Costas Laino, Teodoro |
author_facet | Schwaller, Philippe Gaudin, Théophile Lányi, Dávid Bekas, Costas Laino, Teodoro |
author_sort | Schwaller, Philippe |
collection | PubMed |
description | There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset. |
format | Online Article Text |
id | pubmed-6053976 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-60539762018-08-08 “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models Schwaller, Philippe Gaudin, Théophile Lányi, Dávid Bekas, Costas Laino, Teodoro Chem Sci Chemistry There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset. Royal Society of Chemistry 2018-06-22 /pmc/articles/PMC6053976/ /pubmed/30090297 http://dx.doi.org/10.1039/c8sc02339e Text en This journal is © The Royal Society of Chemistry 2018 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0) |
spellingShingle | Chemistry Schwaller, Philippe Gaudin, Théophile Lányi, Dávid Bekas, Costas Laino, Teodoro “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models |
title | “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
|
title_full | “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
|
title_fullStr | “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
|
title_full_unstemmed | “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
|
title_short | “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
|
title_sort | “found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6053976/ https://www.ncbi.nlm.nih.gov/pubmed/30090297 http://dx.doi.org/10.1039/c8sc02339e |
work_keys_str_mv | AT schwallerphilippe foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels AT gaudintheophile foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels AT lanyidavid foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels AT bekascostas foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels AT lainoteodoro foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels |