Cargando…

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry....

Descripción completa

Detalles Bibliográficos
Autores principales: Schwaller, Philippe, Gaudin, Théophile, Lányi, Dávid, Bekas, Costas, Laino, Teodoro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Royal Society of Chemistry 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6053976/
https://www.ncbi.nlm.nih.gov/pubmed/30090297
http://dx.doi.org/10.1039/c8sc02339e
_version_ 1783340928713359360
author Schwaller, Philippe
Gaudin, Théophile
Lányi, Dávid
Bekas, Costas
Laino, Teodoro
author_facet Schwaller, Philippe
Gaudin, Théophile
Lányi, Dávid
Bekas, Costas
Laino, Teodoro
author_sort Schwaller, Philippe
collection PubMed
description There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset.
format Online
Article
Text
id pubmed-6053976
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-60539762018-08-08 “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models Schwaller, Philippe Gaudin, Théophile Lányi, Dávid Bekas, Costas Laino, Teodoro Chem Sci Chemistry There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset. Royal Society of Chemistry 2018-06-22 /pmc/articles/PMC6053976/ /pubmed/30090297 http://dx.doi.org/10.1039/c8sc02339e Text en This journal is © The Royal Society of Chemistry 2018 http://creativecommons.org/licenses/by-nc/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported Licence (CC BY-NC 3.0)
spellingShingle Chemistry
Schwaller, Philippe
Gaudin, Théophile
Lányi, Dávid
Bekas, Costas
Laino, Teodoro
“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
title “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
title_full “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
title_fullStr “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
title_full_unstemmed “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
title_short “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
title_sort “found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6053976/
https://www.ncbi.nlm.nih.gov/pubmed/30090297
http://dx.doi.org/10.1039/c8sc02339e
work_keys_str_mv AT schwallerphilippe foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels
AT gaudintheophile foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels
AT lanyidavid foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels
AT bekascostas foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels
AT lainoteodoro foundintranslationpredictingoutcomesofcomplexorganicchemistryreactionsusingneuralsequencetosequencemodels