Cargando…

RanDepict: Random chemical structure depiction generator

The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and...

Descripción completa

Detalles Bibliográficos
Autores principales: Brinkhaus, Henning Otto, Rajan, Kohulan, Zielesny, Achim, Steinbeck, Christoph
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169273/
https://www.ncbi.nlm.nih.gov/pubmed/35668480
http://dx.doi.org/10.1186/s13321-022-00609-4
_version_ 1784721171564986368
author Brinkhaus, Henning Otto
Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
author_facet Brinkhaus, Henning Otto
Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
author_sort Brinkhaus, Henning Otto
collection PubMed
description The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems. GRAPHICAL ABSTRACT: [Image: see text]
format Online
Article
Text
id pubmed-9169273
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-91692732022-06-07 RanDepict: Random chemical structure depiction generator Brinkhaus, Henning Otto Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph J Cheminform Software The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems. GRAPHICAL ABSTRACT: [Image: see text] Springer International Publishing 2022-06-06 /pmc/articles/PMC9169273/ /pubmed/35668480 http://dx.doi.org/10.1186/s13321-022-00609-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Brinkhaus, Henning Otto
Rajan, Kohulan
Zielesny, Achim
Steinbeck, Christoph
RanDepict: Random chemical structure depiction generator
title RanDepict: Random chemical structure depiction generator
title_full RanDepict: Random chemical structure depiction generator
title_fullStr RanDepict: Random chemical structure depiction generator
title_full_unstemmed RanDepict: Random chemical structure depiction generator
title_short RanDepict: Random chemical structure depiction generator
title_sort randepict: random chemical structure depiction generator
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169273/
https://www.ncbi.nlm.nih.gov/pubmed/35668480
http://dx.doi.org/10.1186/s13321-022-00609-4
work_keys_str_mv AT brinkhaushenningotto randepictrandomchemicalstructuredepictiongenerator
AT rajankohulan randepictrandomchemicalstructuredepictiongenerator
AT zielesnyachim randepictrandomchemicalstructuredepictiongenerator
AT steinbeckchristoph randepictrandomchemicalstructuredepictiongenerator