Cargando…

DECIMER: towards deep learning for chemical image recognition

The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical I...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajan, Kohulan, Zielesny, Achim, Steinbeck, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7590713/
https://www.ncbi.nlm.nih.gov/pubmed/33372621
http://dx.doi.org/10.1186/s13321-020-00469-w
_version_ 1783600858694418432
author Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
author_facet Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
author_sort Rajan, Kohulan
collection PubMed
description The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose.
format Online
Article
Text
id pubmed-7590713
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-75907132020-10-27 DECIMER: towards deep learning for chemical image recognition Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph J Cheminform Preliminary Communication The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose. Springer International Publishing 2020-10-27 /pmc/articles/PMC7590713/ /pubmed/33372621 http://dx.doi.org/10.1186/s13321-020-00469-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Preliminary Communication
Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
DECIMER: towards deep learning for chemical image recognition
title DECIMER: towards deep learning for chemical image recognition
title_full DECIMER: towards deep learning for chemical image recognition
title_fullStr DECIMER: towards deep learning for chemical image recognition
title_full_unstemmed DECIMER: towards deep learning for chemical image recognition
title_short DECIMER: towards deep learning for chemical image recognition
title_sort decimer: towards deep learning for chemical image recognition
topic Preliminary Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7590713/
https://www.ncbi.nlm.nih.gov/pubmed/33372621
http://dx.doi.org/10.1186/s13321-020-00469-w
work_keys_str_mv AT rajankohulan decimertowardsdeeplearningforchemicalimagerecognition
AT zielesnyachim decimertowardsdeeplearningforchemicalimagerecognition
AT steinbeckchristoph decimertowardsdeeplearningforchemicalimagerecognition