Cargando…
DECIMER 1.0: deep learning for chemical image recognition using transformers
The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369700/ https://www.ncbi.nlm.nih.gov/pubmed/34404468 http://dx.doi.org/10.1186/s13321-021-00538-8 |
_version_ | 1783739341726547968 |
---|---|
author | Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph |
author_facet | Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph |
author_sort | Rajan, Kohulan |
collection | PubMed |
description | The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50–100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information. [Image: see text] |
format | Online Article Text |
id | pubmed-8369700 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-83697002021-08-18 DECIMER 1.0: deep learning for chemical image recognition using transformers Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph J Cheminform Research Article The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50–100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information. [Image: see text] Springer International Publishing 2021-08-17 /pmc/articles/PMC8369700/ /pubmed/34404468 http://dx.doi.org/10.1186/s13321-021-00538-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph DECIMER 1.0: deep learning for chemical image recognition using transformers |
title | DECIMER 1.0: deep learning for chemical image recognition using transformers |
title_full | DECIMER 1.0: deep learning for chemical image recognition using transformers |
title_fullStr | DECIMER 1.0: deep learning for chemical image recognition using transformers |
title_full_unstemmed | DECIMER 1.0: deep learning for chemical image recognition using transformers |
title_short | DECIMER 1.0: deep learning for chemical image recognition using transformers |
title_sort | decimer 1.0: deep learning for chemical image recognition using transformers |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369700/ https://www.ncbi.nlm.nih.gov/pubmed/34404468 http://dx.doi.org/10.1186/s13321-021-00538-8 |
work_keys_str_mv | AT rajankohulan decimer10deeplearningforchemicalimagerecognitionusingtransformers AT zielesnyachim decimer10deeplearningforchemicalimagerecognitionusingtransformers AT steinbeckchristoph decimer10deeplearningforchemicalimagerecognitionusingtransformers |