Cargando…
DECIMER: towards deep learning for chemical image recognition
The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical I...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7590713/ https://www.ncbi.nlm.nih.gov/pubmed/33372621 http://dx.doi.org/10.1186/s13321-020-00469-w |
_version_ | 1783600858694418432 |
---|---|
author | Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph |
author_facet | Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph |
author_sort | Rajan, Kohulan |
collection | PubMed |
description | The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose. |
format | Online Article Text |
id | pubmed-7590713 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-75907132020-10-27 DECIMER: towards deep learning for chemical image recognition Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph J Cheminform Preliminary Communication The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose. Springer International Publishing 2020-10-27 /pmc/articles/PMC7590713/ /pubmed/33372621 http://dx.doi.org/10.1186/s13321-020-00469-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Preliminary Communication Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph DECIMER: towards deep learning for chemical image recognition |
title | DECIMER: towards deep learning for chemical image recognition |
title_full | DECIMER: towards deep learning for chemical image recognition |
title_fullStr | DECIMER: towards deep learning for chemical image recognition |
title_full_unstemmed | DECIMER: towards deep learning for chemical image recognition |
title_short | DECIMER: towards deep learning for chemical image recognition |
title_sort | decimer: towards deep learning for chemical image recognition |
topic | Preliminary Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7590713/ https://www.ncbi.nlm.nih.gov/pubmed/33372621 http://dx.doi.org/10.1186/s13321-020-00469-w |
work_keys_str_mv | AT rajankohulan decimertowardsdeeplearningforchemicalimagerecognition AT zielesnyachim decimertowardsdeeplearningforchemicalimagerecognition AT steinbeckchristoph decimertowardsdeeplearningforchemicalimagerecognition |