Cargando…

Img2Mol – accurate SMILES recognition from molecular graphical depictions

The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous ve...

Descripción completa

Detalles Bibliográficos
Autores principales: Clevert, Djork-Arné, Le, Tuan, Winter, Robin, Montanari, Floriane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565361/
https://www.ncbi.nlm.nih.gov/pubmed/34760202
http://dx.doi.org/10.1039/d1sc01839f
_version_ 1784593809676435456
author Clevert, Djork-Arné
Le, Tuan
Winter, Robin
Montanari, Floriane
author_facet Clevert, Djork-Arné
Le, Tuan
Winter, Robin
Montanari, Floriane
author_sort Clevert, Djork-Arné
collection PubMed
description The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.
format Online
Article
Text
id pubmed-8565361
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Royal Society of Chemistry
record_format MEDLINE/PubMed
spelling pubmed-85653612021-11-09 Img2Mol – accurate SMILES recognition from molecular graphical depictions Clevert, Djork-Arné Le, Tuan Winter, Robin Montanari, Floriane Chem Sci Chemistry The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users. The Royal Society of Chemistry 2021-09-29 /pmc/articles/PMC8565361/ /pubmed/34760202 http://dx.doi.org/10.1039/d1sc01839f Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
Clevert, Djork-Arné
Le, Tuan
Winter, Robin
Montanari, Floriane
Img2Mol – accurate SMILES recognition from molecular graphical depictions
title Img2Mol – accurate SMILES recognition from molecular graphical depictions
title_full Img2Mol – accurate SMILES recognition from molecular graphical depictions
title_fullStr Img2Mol – accurate SMILES recognition from molecular graphical depictions
title_full_unstemmed Img2Mol – accurate SMILES recognition from molecular graphical depictions
title_short Img2Mol – accurate SMILES recognition from molecular graphical depictions
title_sort img2mol – accurate smiles recognition from molecular graphical depictions
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565361/
https://www.ncbi.nlm.nih.gov/pubmed/34760202
http://dx.doi.org/10.1039/d1sc01839f
work_keys_str_mv AT clevertdjorkarne img2molaccuratesmilesrecognitionfrommoleculargraphicaldepictions
AT letuan img2molaccuratesmilesrecognitionfrommoleculargraphicaldepictions
AT winterrobin img2molaccuratesmilesrecognitionfrommoleculargraphicaldepictions
AT montanarifloriane img2molaccuratesmilesrecognitionfrommoleculargraphicaldepictions