Cargando…

DECIMER 1.0: deep learning for chemical image recognition using transformers

The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajan, Kohulan, Zielesny, Achim, Steinbeck, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369700/
https://www.ncbi.nlm.nih.gov/pubmed/34404468
http://dx.doi.org/10.1186/s13321-021-00538-8
_version_ 1783739341726547968
author Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
author_facet Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
author_sort Rajan, Kohulan
collection PubMed
description The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50–100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information. [Image: see text]
format Online
Article
Text
id pubmed-8369700
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-83697002021-08-18 DECIMER 1.0: deep learning for chemical image recognition using transformers Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph J Cheminform Research Article The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50–100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information. [Image: see text] Springer International Publishing 2021-08-17 /pmc/articles/PMC8369700/ /pubmed/34404468 http://dx.doi.org/10.1186/s13321-021-00538-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
DECIMER 1.0: deep learning for chemical image recognition using transformers
title DECIMER 1.0: deep learning for chemical image recognition using transformers
title_full DECIMER 1.0: deep learning for chemical image recognition using transformers
title_fullStr DECIMER 1.0: deep learning for chemical image recognition using transformers
title_full_unstemmed DECIMER 1.0: deep learning for chemical image recognition using transformers
title_short DECIMER 1.0: deep learning for chemical image recognition using transformers
title_sort decimer 1.0: deep learning for chemical image recognition using transformers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369700/
https://www.ncbi.nlm.nih.gov/pubmed/34404468
http://dx.doi.org/10.1186/s13321-021-00538-8
work_keys_str_mv AT rajankohulan decimer10deeplearningforchemicalimagerecognitionusingtransformers
AT zielesnyachim decimer10deeplearningforchemicalimagerecognitionusingtransformers
AT steinbeckchristoph decimer10deeplearningforchemicalimagerecognitionusingtransformers