Cargando…

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a proces...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwaller, Philippe, Hoover, Benjamin, Reymond, Jean-Louis, Strobelt, Hendrik, Laino, Teodoro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8026122/
https://www.ncbi.nlm.nih.gov/pubmed/33827815
http://dx.doi.org/10.1126/sciadv.abe4166
_version_ 1783675615349571584
author Schwaller, Philippe
Hoover, Benjamin
Reymond, Jean-Louis
Strobelt, Hendrik
Laino, Teodoro
author_facet Schwaller, Philippe
Hoover, Benjamin
Reymond, Jean-Louis
Strobelt, Hendrik
Laino, Teodoro
author_sort Schwaller, Philippe
collection PubMed
description Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.
format Online
Article
Text
id pubmed-8026122
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-80261222021-04-21 Extraction of organic chemistry grammar from unsupervised learning of chemical reactions Schwaller, Philippe Hoover, Benjamin Reymond, Jean-Louis Strobelt, Hendrik Laino, Teodoro Sci Adv Research Articles Humans use different domain languages to represent, explore, and communicate scientific concepts. During the last few hundred years, chemists compiled the language of chemical synthesis inferring a series of “reaction rules” from knowing how atoms rearrange during a chemical transformation, a process called atom-mapping. Atom-mapping is a laborious experimental task and, when tackled with computational methods, requires continuous annotation of chemical reactions and the extension of logically consistent directives. Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks. American Association for the Advancement of Science 2021-04-07 /pmc/articles/PMC8026122/ /pubmed/33827815 http://dx.doi.org/10.1126/sciadv.abe4166 Text en Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (https://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
spellingShingle Research Articles
Schwaller, Philippe
Hoover, Benjamin
Reymond, Jean-Louis
Strobelt, Hendrik
Laino, Teodoro
Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
title Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
title_full Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
title_fullStr Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
title_full_unstemmed Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
title_short Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
title_sort extraction of organic chemistry grammar from unsupervised learning of chemical reactions
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8026122/
https://www.ncbi.nlm.nih.gov/pubmed/33827815
http://dx.doi.org/10.1126/sciadv.abe4166
work_keys_str_mv AT schwallerphilippe extractionoforganicchemistrygrammarfromunsupervisedlearningofchemicalreactions
AT hooverbenjamin extractionoforganicchemistrygrammarfromunsupervisedlearningofchemicalreactions
AT reymondjeanlouis extractionoforganicchemistrygrammarfromunsupervisedlearningofchemicalreactions
AT strobelthendrik extractionoforganicchemistrygrammarfromunsupervisedlearningofchemicalreactions
AT lainoteodoro extractionoforganicchemistrygrammarfromunsupervisedlearningofchemicalreactions