Cargando…
RanDepict: Random chemical structure depiction generator
The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169273/ https://www.ncbi.nlm.nih.gov/pubmed/35668480 http://dx.doi.org/10.1186/s13321-022-00609-4 |
_version_ | 1784721171564986368 |
---|---|
author | Brinkhaus, Henning Otto Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph |
author_facet | Brinkhaus, Henning Otto Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph |
author_sort | Brinkhaus, Henning Otto |
collection | PubMed |
description | The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems. GRAPHICAL ABSTRACT: [Image: see text] |
format | Online Article Text |
id | pubmed-9169273 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-91692732022-06-07 RanDepict: Random chemical structure depiction generator Brinkhaus, Henning Otto Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph J Cheminform Software The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems. GRAPHICAL ABSTRACT: [Image: see text] Springer International Publishing 2022-06-06 /pmc/articles/PMC9169273/ /pubmed/35668480 http://dx.doi.org/10.1186/s13321-022-00609-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Brinkhaus, Henning Otto Rajan, Kohulan Zielesny, Achim Steinbeck, Christoph RanDepict: Random chemical structure depiction generator |
title | RanDepict: Random chemical structure depiction generator |
title_full | RanDepict: Random chemical structure depiction generator |
title_fullStr | RanDepict: Random chemical structure depiction generator |
title_full_unstemmed | RanDepict: Random chemical structure depiction generator |
title_short | RanDepict: Random chemical structure depiction generator |
title_sort | randepict: random chemical structure depiction generator |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169273/ https://www.ncbi.nlm.nih.gov/pubmed/35668480 http://dx.doi.org/10.1186/s13321-022-00609-4 |
work_keys_str_mv | AT brinkhaushenningotto randepictrandomchemicalstructuredepictiongenerator AT rajankohulan randepictrandomchemicalstructuredepictiongenerator AT zielesnyachim randepictrandomchemicalstructuredepictiongenerator AT steinbeckchristoph randepictrandomchemicalstructuredepictiongenerator |