Cargando…

Img2Mol – accurate SMILES recognition from molecular graphical depictions

The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous ve...

Descripción completa

Detalles Bibliográficos
Autores principales: Clevert, Djork-Arné, Le, Tuan, Winter, Robin, Montanari, Floriane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society of Chemistry 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565361/
https://www.ncbi.nlm.nih.gov/pubmed/34760202
http://dx.doi.org/10.1039/d1sc01839f
Descripción
Sumario:The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.