Cargando…

Reconstruction of lossless molecular representations from fingerprints

The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution...

Descripción completa

Detalles Bibliográficos
Autores principales: Ucak, Umit V., Ashyrmamatov, Islambek, Lee, Juyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9948316/
https://www.ncbi.nlm.nih.gov/pubmed/36823647
http://dx.doi.org/10.1186/s13321-023-00693-0
_version_ 1784892755430866944
author Ucak, Umit V.
Ashyrmamatov, Islambek
Lee, Juyong
author_facet Ucak, Umit V.
Ashyrmamatov, Islambek
Lee, Juyong
author_sort Ucak, Umit V.
collection PubMed
description The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00693-0.
format Online
Article
Text
id pubmed-9948316
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-99483162023-02-24 Reconstruction of lossless molecular representations from fingerprints Ucak, Umit V. Ashyrmamatov, Islambek Lee, Juyong J Cheminform Research The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00693-0. Springer International Publishing 2023-02-23 /pmc/articles/PMC9948316/ /pubmed/36823647 http://dx.doi.org/10.1186/s13321-023-00693-0 Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ucak, Umit V.
Ashyrmamatov, Islambek
Lee, Juyong
Reconstruction of lossless molecular representations from fingerprints
title Reconstruction of lossless molecular representations from fingerprints
title_full Reconstruction of lossless molecular representations from fingerprints
title_fullStr Reconstruction of lossless molecular representations from fingerprints
title_full_unstemmed Reconstruction of lossless molecular representations from fingerprints
title_short Reconstruction of lossless molecular representations from fingerprints
title_sort reconstruction of lossless molecular representations from fingerprints
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9948316/
https://www.ncbi.nlm.nih.gov/pubmed/36823647
http://dx.doi.org/10.1186/s13321-023-00693-0
work_keys_str_mv AT ucakumitv reconstructionoflosslessmolecularrepresentationsfromfingerprints
AT ashyrmamatovislambek reconstructionoflosslessmolecularrepresentationsfromfingerprints
AT leejuyong reconstructionoflosslessmolecularrepresentationsfromfingerprints